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Field Of The Invention 

The invention relates to isolated nucleic acids and polypeptides derived from 
15 Enter obacter cloacae that are useful as molecular targets for diagnostics, prophylaxis 



m 

< f 3 and treatment of pathological conditions, as well as materials and methods for the 



diagnosis, prevention, and amelioration of pathological conditions resulting from 
bacterial infection. 

20 Background Of The Invention 

Enterobacter cloacae (E. cloacae) belongs to the bacterial family 
Enterobacteriaceae, whose diverse members are Gram-negative rods that are glucose 
fermenters and nitrate reducers. These organisms are found free-living in nature and 
as part of the indigenous flora of human and animals. They grow rapidly under 

25 aerobic and anaerobic conditions and are metabolically active, utilizing a variety of 
substrates. Most species are opportunistic pathogens (Kenneth Ryan, 
Enterobacteriaceae, Chap. 20, Medical Microbiology, An Introduction to Infectious 
Diseases, Second Edition, Editor, John C. Sherris, Elsevier, New York, 1990). 
E. cloacae is an ornithine- positive, lysine-negative pathogen that can be 

30 associated with urinary tract and respiratory tract infections. The bacteria produces 



endotoxins which as aerosols can penetrate into the lungs causing fever, coughing, 
difficulty in breathing and wheezing (Fairley, T. and Gislason, S., 1986-1997, 
Environmed Research Inc). E. cloacae is becoming progressively common in 
newborns in Neonatal Intensive Care Units (NICU) (Shi, Z.Y., et al, 1996, J. Clin. 
5 Microbiol. 34:2784-2790; Cordero, L., et al, 1997, Pediatr. Infect. Dis. J. 16:18-23; 
Acolet, D., et al, 1994, J. Hosp. Infect. 28:273-286). A study at Children's Hospital 
in Michigan showed a four-fold increase in Enterobacter in patients with bacteremia 
between 1989 and 1992. E. cloacae accounted for 74% of the isolates. Twenty eight 
percent of the infected children went into shock and six percent died (Andresen, J., et 

10 al, 1994, Pediatr. Infect. Dis. J. 13: 787-792). An outbreak of multidrug-resistant E. 
cloacae lasted for 4 months in the NICU in China (Shi, Z.Y., et al, 1996, J. Clin. 
Microbiol. 34:2784-2790). Outbreaks have also occurred in surgical wards (Burchard, 
K.W., et al, 1986, Surgery 100:857-862 ) and burn units (Markowitz, S. M., et al, 
1983, J. Infect. Dis. 148:18-23). E. cloacae has also been shown to be the causative 

15 agent in a case of gas gangrene (Fata, F., et al, 1996, South Med. J. 89:1095-1096). 

Epidemiology of E. cloacae is not completely understood, although studies of 
infection and colonization point to the endogenous flora of the patients. Molecular 
typing results of 141 strains of E. cloacae from broad geographic areas in the United 
States (from the National Surveillance Program: SCOPE) indicated that although 

20 clonal spread of a single strain was observed within a given institution most of the 
episodes of bacteremia were caused by strains unique to the individual patients. 
Therefore, selection of mutant subpopulations within each endogenous infection can 
be caused by drug exposure (Pfaller, M.A., 1997, Diagn. Microbiol. Infect. Dis. 
28:211-219). 

25 Antibiotic resistance is a major problem in the control of infectious diseases. 

Strains of E. cloacae resistant to broad-spectrum penicillins and beta-lactamase-stable 

2 



cephalosporins occurs at a frequency of 10 7 to 10 6 (Kadima, T.A. and WeinerJ.H., 
1997, Antimicrobiol. Agents Chemother. 41:2177-2183; Lampe, M.F., et al, 
Antimicrob. Agents Chemother. 21:655-660; Lindberg, F., et al, Rev. Infect. Dis. 8 
[Suppl 3]:S292-S304). Selected fluroquinolones have often been successfully 
5 administered to patients with urinary tract infections; however, E. cloacae has become 
resistant to many of them (Deguchi, T., et al, 1997, Antimicrobiol. Agents Chemother. 
41: 2544-2546). Some resistance has been attributed to plasmid-containing E. cloacae 
and some to the E. cloacae chromosome. In Holland, two different resistant strains of 
...» E. cloacae have been identified. The Amsterdam strain (resistant to ceffotaxin and 

j : y 10 piperacillin) exhibits depressed chromosomal Class 1 beta-lactamase, whereas the 

! : U Rotterdam strain (resistant to cefuroxine) favors the spread of a plasmid encoding 

'■ 3 TEM-2 beta-lactamase (Namavar, F., 1997, BIO 99-53 99-606615 ). Resistant strains 

!i* of E. cloacae developed within 6 days in nearly 50% of the E. c/oacae-infected 

i,* intensive care patients with pulmonary complications treated with cefotaxime (Fussle, 

i j 15 et al., 1994, Clin. Investig. 72:1015-1019). While several antimicrobial agents retain 

potent activity against the highly resistant organisms (Pfaller, M.A., 1997, Diagn. 
Microbiol. Infect. Dis. 28:211-219), constant exposure to these agents may eventually 
result in resistance. 

E. cloacae has been shown to be beneficial to plants in the control of diseases 
20 caused by bacteria (Bacon, C.W., et al., PCT publication WO 97/24433). As a 

biocontrol agent, E. cloacae coated onto cucumber seed has protected the seed from a 
lethal infection of the fungus Pythium ultimum (Nelson, E.B., et al, 1992, Can. J. Plant 
Pathol. 14:106-114). Nutritional mutants of E. cloacae were also protective and it has 
been suggested that mutant strains would be beneficial for an environmental 
25 containment strategy (Roberts, D.P., et al, 1994, Plant Science [Limerick], 10183-89). 
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Summary Of The Invention 

The present invention fulfills the need for diagnostic tools and therapeutics by 
providing bacterial-specific compositions and methods for detecting Enterobacter 
species including E. cloacae, as well as compositions and methods useful for treating 
5 and preventing Enterobacter infection, in particular, E. cloacae infection, in 
vertebrates including mammals. 

The present invention encompasses isolated nucleic acids and polypeptides 
derived from E. cloacae that are useful as reagents for diagnosis of bacterial disease, 
components of effective antibacterial vaccines, and/or as targets for antibacterial drugs 

i. -j 

ry 10 including anti-ii. cloacae drugs. They can also be used to detect the presence of E. 

I: ri 

S y cloacae and other Enterobacter species in a sample; and in screening compounds for 

jjff 

;<3 the ability to interfere with the E. cloacae life cycle or to inhibit E. cloacae infection. 

!U They also have use as biocontrol agents for plants. 

I'll 

\ tA In one aspect, the invention features compositions of nucleic acids 

1:3 

-; : 3 15 corresponding to entire coding sequences of E. cloacae proteins, including surface or 

secreted proteins or parts thereof, nucleic acids capable of binding mRNA from E. 
cloacae proteins to block protein translation, and methods for producing E. cloacae 
proteins or parts thereof using peptide synthesis and recombinant DNA techniques. 
This invention also features antibodies and nucleic acids useful as probes to detect E. 

20 cloacae infection. In addition, vaccine compositions and methods for the protection or 
treatment of infection by E. cloacae are within the scope of this invention. 

The nucleotide sequences provided in SEQ ID NO: 1 - SEQ ID NO: 5662, a 
fragment thereof, or a nucleotide sequence at least about 99.5% identical to a 
sequence contained within SEQ ID NO: 1 - SEQ ID NO: 5662 may be "provided" in 

25 a variety of medias to facilitate use thereof. As used herein, "provided" refers to a 
manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide 
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sequence of the present invention, i.e., the nucleotide sequence provided in SEQ ID 
NO: 1 - SEQ ID NO: 5662, a fragment thereof, or a nucleotide sequence at least 
about 99.5% identical to a sequence contained within SEQ ID NO: 1 - SEQ ID NO: 
5662. Uses for and methods for providing nucleotide sequences in a variety of media 
5 is well known in the art (see e.g., EPO Publication No. EP 0 756 006). 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
readable media" refers to any media which can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media, such 

10 as floppy discs, hard disc storage media, and magnetic tape; optical storage media 
such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of 
these categories such as magnetic/optical storage media. A person skilled in the art 
can readily appreciate how any of the presently known computer readable media can 
be used to create a manufacture comprising computer readable media having recorded 

15 thereon a nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable media. A person skilled in the art can readily adopt any of the 
presently known methods for recording information on computer readable media to 
generate manufactures comprising the nucleotide sequence information of the present 

20 invention. 

A variety of data storage structures are available to a person skilled in the art 
for creating a computer readable media having recorded thereon a nucleotide sequence 
of the present invention. The choice of the data storage structure will generally be 
based on the means chosen to access the stored information. In addition, a variety of 
25 data processor programs and formats can be used to store the nucleotide sequence 
information of the present invention on computer readable media. The sequence 

5 



information can be represented in a word processing text file, formatted in 
commercially-available software such as WordPerfect and Microsoft Word, or 
represented in the form of an ASCII file, stored in a database application, such as 
DB2, Sybase, Oracle, or the like. A person skilled in the art can readily adapt any 
5 number of data processor structuring formats (e.g. text file or database) in order to 
obtain computer readable media having recorded thereon the nucleotide sequence 
information of the present invention. 

By providing the nucleotide sequence of SEQ ID NO: 1 - SEQ ID NO: 5662, a 
fragment thereof, or a nucleotide sequence at least about 99.5% identical to SEQ ID 

i. ~p 

I;rJ 10 NO: 1 - SEQ ID NO: 5662 in computer readable form, a person skilled in the art can 

l r£ 

i y routinely access the coding sequence information for a variety of purposes. Computer 

w software is publicly available which allows a person skilled in the art to access 

f;^ sequence information provided in a computer readable media. Examples of such 

J i computer software include programs of the "Staden Package", "DNA Star", 

1 3 15 "Mac Vector", GCG "Wisconsin Package" (Genetics Computer Group, Madison, WI) 

and "NCBI Toolbox" (National Center For Biotechnology Information). Suitable 
programs are described, for example, in Martin J. Bishop, ed„ Guide to Human 
Genome Computing, 2d Edition, Academic Press, San Diego, CA. (1998); and 
Leonard F. Peruski, Jr., and Anne Harwood Peruski, The Internet and the New 
20 Biology: Tools for Genomic and Molecular Research, American Society for 
Microbiology, Washington, D.C. (1997). 

Computer algorithms enable the identification of E. cloacae open reading 
frames (ORFs) within SEQ ID NO: 1 - SEQ ID NO: 5662 which contain homology to 
ORFs or proteins from other organisms. Examples of such similarity-search 
25 algorithms include the BLAST [Altschul et al., J. Mol. Biol. 215:403-410 (1990)] and 
Smith-Waterman [Smith and Waterman (1981) Advances in Applied Mathematics, 



2:482-489] search algorithms. Suitable search algorithms are described, for example, 
in Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, Academic 
Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne Harwood 
Peruski, The Internet and the New Biology: Tools for Genomic and Molecular 
5 Research, American Society for Microbiology, Washington, D.C. (1997). Such 
algorithms are utilized on computer systems as exemplified below. The ORFs so 
identified represent protein encoding fragments within the E. cloacae genome and are 
useful in producing commercially important proteins such as enzymes used in 
fermentation reactions and in the production of commercially useful metabolites. 

10 The present invention further provides systems, particularly computer-based 

systems, which contain the sequence information described herein. Such systems are 
designed to identify commercially important fragments of the E. cloacae genome. As 
used herein, "a computer-based system" refers to the hardware means, software means, 
and data storage means used to analyze the nucleotide sequence information of the 

15 present invention. The minimum hardware means of the computer-based systems of 
the present invention comprises a central processing unit (CPU), input means, output 
means, and data storage means. A person skilled in the art can readily appreciate that 
any one of the currently available computer-based systems is suitable for use in the 
present invention. The computer-based systems of the present invention comprise a 

20 data storage means having stored therein a nucleotide sequence of the present 

invention and the necessary hardware means and software means for supporting and 
implementing a search means. As used herein, "data storage means" refers to memory 
which can store nucleotide sequence information of the present invention, or a 
memory access means which can access manufactures having recorded thereon the 

25 nucleotide sequence information of the present invention. 
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As used herein, "search means" refers to one or more programs which are 
implemented on the computer-based system to compare a target sequence or target 
structural motif with the sequence information stored within the data storage means. 
Search means are used to identify fragments or regions of the E. cloacae genome 
5 which are similar to, or "match", a particular target sequence or target motif. A 
variety of known algorithms are known in the art and have been disclosed publicly, 
and a variety of commercially available software for conducting homology-based 
similarity searches are available and can be used in the computer-based systems of the 
. Ia . present invention. Examples of such software includes, but is not limited to, FASTA 

fjj 10 (GCG Wisconsin Package), Bic_SW (Compugen Bioccelerator), BLASTN2, 

i y BLASTP2, BLASTX2 (NCBI) and Motifs (GCG). Suitable software programs are 

; :3 described, for example, in Martin J. Bishop, ed., Guide to Human Genome Computing, 

2d Edition, Academic Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and 
i £ Anne Harwood Peruski, The Internet and the New Biology: Tools for Genomic and 

i 'i 15 Molecular Research, American Society for Microbiology, Washington, D.C. (1997). 

'■I J 

A person skilled in the art can readily recognize that any one of the available 
algorithms or implementing software packages for conducting homology searches can 
be adapted for use in the present computer-based systems. 

As used herein, a "target sequence" can be any DNA or amino acid sequence 

20 of six or more nucleotides or two or more amino acids. A person skilled in the art can 
readily recognize that the longer a target sequence is, the less likely a target sequence 
will be present as a random occurrence in the database. The most preferred sequence 
length of a target sequence is from about 10 to 100 amino acids or from about 30 to 
300 nucleotide residues. However, it is well recognized that many genes are longer 

25 than 500 amino acids, or 1.5 kb in length, and that commercially important fragments 
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of the E. cloacae genome, such as sequence fragments involved in gene expression 
and protein processing, will often be shorter than 30 nucleotides. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) are 
5 chosen based on a specific functional domain or three-dimensional configuration 
which is formed upon the folding of the target polypeptide. There are a variety of 
target motifs known in the art. Protein target motifs include, but are not limited to, 
enzymatic active sites, membrane-spanning regions, and signal sequences. Nucleic 
acid target motifs include, but are not limited to, promoter sequences, hairpin 

= 3 

r y 10 structures and inducible expression elements (protein binding sequences). 

ry A variety of structural formats for the input and output means can be used to 

1. 1 I 

•:3 input and output the information in the computer-based systems of the present 

;U invention. A preferred format for an output means ranks fragments of the E. cloacae 

\H genome possessing varying degrees of homology to the target sequence or target 

it 

1 3 15 motif. Such presentation provides a person skilled in the art with a ranking of 

sequences which contain various amounts of the target sequence or target motif and 
identifies the degree of homology contained in the identified fragment. 

A variety of comparing means can be used to compare a target sequence or 
target motif with the data storage means to identify sequence fragments of the E. 

20 cloacae genome. In the present examples, implementing software which implement the 
BLASTP2 and bic_SW algorithms (Altschul et aL, J Mol. Biol. 215:403-410 (1990); 
Compugen Biocellerator) was used to identify open reading frames within the E. 
cloacae genome. A person skilled in the art can readily recognize that any one of the 
publicly available homology search programs can be used as the search means for the 

25 computer-based systems of the present invention. Suitable programs are described, for 
example, in Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, 



Academic Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne 
Harwood Peruski, The Internet and the New Biology: Tools for Genomic and 
Molecular Research, American Society for Microbiology, Washington, D.C. (1997). 

The invention features E. cloacae polypeptides, preferably a substantially pure 
5 preparation of an E. cloacae polypeptide, or a recombinant E. cloacae polypeptide. In 
preferred embodiments: the polypeptide has biological activity; the polypeptide has an 
amino acid sequence at least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% 
identical to an amino acid sequence of the invention contained in the Sequence 
Listing, preferably it has about 65% sequence identity with an amino acid sequence of 

10 the invention contained in the Sequence Listing, and most preferably it has about 92% 
to about 99% sequence identity with an amino acid sequence of the invention 
contained in the Sequence Listing; the polypeptide has an amino acid sequence 
essentially the same as an amino acid sequence of the invention contained in the 
Sequence Listing; the polypeptide is at least about 5, 10, 20, 50, 100, or 150 amino 

15 acid residues in length; the polypeptide includes at least about 5, preferably at least 
about 10, more preferably at least about 20, still more preferably at least about 50, 
100, or 150 contiguous amino acid residues of the invention contained in the 
Sequence Listing. In yet another preferred embodiment, the amino acid sequence 
which differs in sequence identity by about 7% to about 8% from the E. cloacae 

20 amino acid sequences of the invention contained in the Sequence Listing is also 
encompassed by the invention. 

In preferred embodiments: the E. cloacae polypeptide is encoded by a nucleic 
acid of the invention contained in the Sequence Listing, or by a nucleic acid having at 
least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a nucleic acid 

25 of the invention contained in the Sequence Listing. 
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In a preferred embodiment, the subject E. cloacae polypeptide differs in amino 
acid sequence at about 1, 2, 3, 5, 10 or more residues from a sequence of the 
invention contained in the Sequence Listing. The differences, however, are such that 
the E. cloacae polypeptide exhibits an E. cloacae biological activity, e.g., the E. 
5 cloacae polypeptide retains a biological activity of a naturally occurring E. cloacae 
enzyme. 

In preferred embodiments, the polypeptide includes all or a fragment of an 
amino acid sequence of the invention contained in the Sequence Listing; fused, in 
reading frame, to additional amino acid residues, preferably to residues encoded by 

jiy 10 genomic DNA 5' or 3* to the genomic DNA which encodes a sequence of the 

f : y invention contained in the Sequence Listing. 

;=3 In yet other preferred embodiments, the E. cloacae polypeptide is a 

L recombinant fusion protein having a first E. cloacae polypeptide portion and a second 

\i polypeptide portion, e.g., a second polypeptide portion having an amino acid sequence 

i;3 

q 15 unrelated to E. cloacae. The second polypeptide portion can be, e.g., any of 

glutathione-S-transferase, a DNA binding domain, or a polymerase activating domain. 
In preferred embodiment the fusion protein can be used in a two-hybrid assay. 

Polypeptides of the invention include those which arise as a result of 
alternative transcription events, alternative RNA splicing events, and alternative 
20 translational and postradiational events. 

In a preferred embodiment, the encoded E. cloacae polypeptide differs (e.g., by 
amino acid substitution, addition or deletion of at least one amino acid residue) in 
amino acid sequence at about 1, 2, 3, 5, 10 or more residues, from a sequence of the 
invention contained in the Sequence Listing. The differences, however, are such that: 
25 the £. cloacae encoded polypeptide exhibits an E. cloacae biological activity, e.g., the 
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encoded E. cloacae enzyme retains a biological activity of a naturally occurring E. 
cloacae. 

In preferred embodiments, the encoded polypeptide includes all or a fragment 
of an amino acid sequence of the invention contained in the Sequence Listing; fused, 
5 in reading frame, to additional amino acid residues, preferably to residues encoded by 
genomic DNA 5' or 3* to the genomic DNA which encodes a sequence of the 
invention contained in the Sequence Listing. 

The E. cloacae strain, 15842, from which genomic sequences have been 
sequenced, has been deposited on August 22, 1997, in the American Type Culture 
10 Collection and assigned the ATCC designation # 202023. 

rf 

U Included in the invention are: allelic variations; natural mutants; induced 

mutants; proteins encoded by DNA that hybridize under high or low stringency 
conditions to a nucleic acid which encodes a polypeptide of the invention contained in 
the Sequence Listing (for definitions of high and low stringency see Current Protocols 
15 in Molecular Biology, John Wiley & Sons, New York, 1989, 6.3.1 - 6.3.6, hereby 
incorporated by reference); and, polypeptides specifically bound by antisera to E. 
cloacae polypeptides, especially by antisera to an active site or binding domain of E. 
cloacae polypeptide. The invention also includes fragments, preferably biologically 
active fragments. These and other polypeptides are also referred to herein as E. 
20 cloacae polypeptide analogs or variants. 

The invention further provides nucleic acids, e.g., RNA or DNA, encoding a 
polypeptide of the invention. This includes double stranded nucleic acids as well as 
coding and antisense single strands. 

In preferred embodiments, the subject E. cloacae nucleic acid will include a 
25 transcriptional regulatory sequence, e.g., at least one of a transcriptional promoter or 
transcriptional enhancer sequence, operably linked to the E. cloacae gene sequence, 

12 



e.g., to render the E. cloacae gene sequence suitable for expression in a recombinant 
host cell. 

In yet a further preferred embodiment, the nucleic acid which encodes an E. 
cloacae polypeptide of the invention, hybridizes under stringent conditions to a 
5 nucleic acid probe corresponding to at least about 8 consecutive nucleotides of the 
invention contained in the Sequence Listing; more preferably to at least about 12 
consecutive nucleotides of the invention contained in the Sequence Listing; still more 
preferably to at least about 20 consecutive nucleotides of the invention contained in 
the Sequence Listing; most preferably to at least about 40 consecutive nucleotides of 

10 the invention contained in the Sequence Listing. 

In another aspect, the invention provides a substantially pure nucleic acid 
having a nucleotide sequence which encodes an E. cloacae polypeptide. In preferred 
embodiments: the encoded polypeptide has biological activity; the encoded 
polypeptide has an amino acid sequence at least about 60%, 70%, 80%, 90%, 95%, 

15 98% or 99% homologous to an amino acid sequence of the invention contained in the 
Sequence Listing; the encoded polypeptide has an amino acid sequence essentially the 
same as an amino acid sequence of the invention contained in the Sequence Listing; 
the encoded polypeptide is at least about 5, 10, 20, 50, 100, or 150 amino acids in 
length; the encoded polypeptide comprises at least about 5, preferably at least about 

20 10, more preferably at least about 20, still more preferably at least about 50, 100, or 
150 contiguous amino acids of the invention contained in the Sequence Listing. 

In another aspect, the invention encompasses: a vector including a nucleic 
acid which encodes an E. cloacae polypeptide or an E. cloacae polypeptide variant as 
described herein; a host cell transfected with the vector; and a method of producing a 

25 recombinant E. cloacae polypeptide or E. cloacae polypeptide variant; including 
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culturing the cell, e.g., in a cell culture medium, and isolating an E. cloacae or E. 
cloacae polypeptide variant, e.g., from the cell or from the cell culture medium. 

One embodiment of the invention is directed to substantially isolated nucleic 
acids. Nucleic acids of the invention include sequences comprising at least about 8 
5 nucleotides in length, more preferably at least about 12 nucleotides in length, even 
more preferably at least about 15-20 nucleotides in length, that correspond to a 
subsequence of any one of SEQ ID NO: 1 - SEQ ID NO: 5662 or complements 
thereof. Alternatively, the nucleic acids comprise sequences contained within any 
. sa% ORF (open reading frame), including a complete protein-coding sequence, of which 

j'y 10 any of SEQ ID NO: 1 - SEQ ID NO: 5662 forms a part. The invention encompasses 

i fi 

! U sequence-conservative variants and function-conservative variants of these sequences. 

The nucleic acids may be DNA, RNA, DNA/RNA duplexes, protein-nucleic acid 
(PNA), or derivatives thereof. 

In another aspect, the invention features a purified recombinant nucleic acid 
15 having at least about 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with 
a sequence of the invention contained in the Sequence Listing 

The invention also encompasses recombinant DNA (including DNA cloning 
and expression vectors) comprising these E. cloacae-derived sequences; host cells 
comprising such DNA, including fungal, bacterial, yeast, plant, insect, and mammalian 
20 host cells; and methods for producing expression products comprising RNA and 

polypeptides encoded by the E. cloacae sequences. These methods are carried out by 
incubating a host cell comprising an E. cloacae-derived nucleic acid sequence under 
conditions in which the sequence is expressed. The host cell may be native or 
recombinant The polypeptides can be obtained by (a) harvesting the incubated cells 
25 to produce a cell fraction and a medium fraction; and (b) recovering the E. cloacae 
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polypeptide from the cell fraction, the medium fraction, or both. The polypeptides 
can also be made by in vitro translation. 

In another aspect, the invention features nucleic acids capable of binding 
mRNA of E. cloacae. Such nucleic acid is capable of acting as antisense nucleic acid 
5 to control the translation of mRNA of E. cloacae. A further aspect features a nucleic 
acid which is capable of binding specifically to an E. cloacae nucleic acid. These 
nucleic acids are also referred to herein as complements and have utility as probes and 
as capture reagents. 

In another aspect, the invention features an expression system comprising an 
10 open reading frame corresponding to E. cloacae nucleic acid. The nucleic acid further 
comprises a control sequence compatible with an intended host. The expression 
system is useful for making polypeptides corresponding to E. cloacae nucleic acid. 

In another aspect, the invention encompasses: a vector including a nucleic acid 
which encodes an E. cloacae polypeptide or an E. cloacae polypeptide variant as 
15 described herein; a host cell transfected with the vector; and a method of producing a 
recombinant E. cloacae polypeptide or E. cloacae polypeptide variant; including 
culturing the cell, e.g., in a cell culture medium, and isolating the E. cloacae or E. 
cloacae polypeptide variant, e.g., from the cell or from the cell culture medium. 

In yet another embodiment of the invention encompasses reagents for detecting 
20 bacterial infection, including E. cloacae infection, which comprise at least one E. 
cloacae-derived nucleic acid defined by any one of SEQ ID NO: 1 - SEQ ID NO: 
5662, or sequence-conservative or function-conservative variants thereof. 
Alternatively, the diagnostic reagents comprise nucleotide sequences that are contained 
within any open reading frames (ORFs), including preferably complete protein-coding 
25 sequences, contained within any of SEQ ID NO: 1 - SEQ ID NO: 5662, or 

polypeptide sequences contained within any of SEQ ID NO: 5663 - SEQ ID NO: 

15 
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11324, or polypeptides of which any of the above sequences forms a part, or 
antibodies directed against any of the above peptide sequences or function- 
conservative variants and/or fragments thereof. 

The invention further provides antibodies, preferably monoclonal antibodies, 

5 which specifically bind to the polypeptides of the invention. Methods are also 
provided for producing antibodies in a host animal. The methods of the invention 
comprise immunizing an animal with at least one E. cloacae-derived immunogenic 
component, wherein the immunogenic component comprises one or more of the 
polypeptides encoded by any one of SEQ ID NO: 1 - SEQ ID NO: 5662 or sequence- 

10 conservative or function-conservative variants thereof; or polypeptides that are 

contained within any ORFs, including complete protein-coding sequences, of which 
any of SEQ ID NO: 1 - SEQ ID NO: 5662 forms a part; or polypeptide sequences 



|||| contained within any of SEQ ID NO: 5663 - SEQ ID NO: 11324; or polypeptides of 

which any of SEQ ID NO: 5663 - SEQ ED NO: 11324 forms a part. Host animals 



15 include any warm blooded animal, including without limitation mammals and birds. 
Such antibodies have utility as reagents for immunoassays to evaluate the abundance 
and distribution of E. cloacae-specific antigens. 

In yet another aspect, the invention provides diagnostic methods for detecting 
E. cloacae antigenic components or anti-£. cloacae antibodies in a sample. E. 

20 cloacae antigenic components may be detected by known processes, including but not 
limited to detection by a process comprising: (i) contacting a sample suspected to 
contain a bacterial antigenic component with a bacterial-specific antibody, under 
conditions in which a stable antigen-antibody complex can form between the antibody 
and bacterial antigenic components in the sample; and (ii) detecting any antigen- 

25 antibody complex formed in step (i), wherein detection of an antigen-antibody 

complex indicates the presence of at least one bacterial antigenic component in the 

16 



sample. In different embodiments of this method, the antibodies used are directed 
against a sequence encoded by any of SEQ ID NO: 1 - SEQ ID NO: 5662 or 
sequence-conservative or function-conservative variants thereof, or against a 
polypeptide sequence contained in any of SEQ ID NO: 5663 - SEQ ID NO: 11324 or 
5 function-conservative variants thereof. 

In yet another aspect, the invention provides a method for detecting 
antibacterial-specific antibodies in a sample, which comprises: (i) contacting a sample 
suspected to contain antibacterial-specific antibodies with an £. cloacae antigenic 
component, under conditions in which a stable antigen-antibody complex can form 

10 between the E. cloacae antigenic component and antibacterial antibodies in the 

sample; and (ii) detecting any antigen-antibody complex formed in step (i), wherein 
detection of an antigen-antibody complex indicates the presence of antibacterial 
antibodies in the sample. In different embodiments of this method, the antigenic 
component is encoded by a sequence contained in any of SEQ ID NO: 1 - SEQ ID 

15 NO: 5662 or sequence-conservative and function-conservative variants thereof, or is a 
polypeptide sequence contained in any of SEQ ID NO: 5663 - SEQ ID NO: 11324 or 
function-conservative variants thereof. 

In another aspect, the invention features a method of generating vaccines for 
immunizing an individual against E. cloacae. The method includes: immunizing a 

20 subject with an E. cloacae polypeptide, e.g., a surface or secreted polypeptide, or a 
combination of such peptides or active portion(s) thereof, and a pharmaceutically 
acceptable carrier. Such vaccines have therapeutic and prophylactic utilities. 

In another aspect, the invention features a method of evaluating a compound, 
e.g., a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind 

25 an E. cloacae polypeptide. The method includes contacting the compound to be 
evaluated with an E. cloacae polypeptide and determining if the compound binds or 
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otherwise interacts with the E. cloacae polypeptide. Compounds which bind or 
otherwise interact with E. cloacae polypeptides are candidates as modulators, 
including activators and inhibitors, of the bacterial life cycle. These assays can be 
performed in vitro or in vivo. 

5 In another aspect, the invention features a method of evaluating a compound, 

e.g., a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind 
an E. cloacae nucleic acid, e.g., DNA or RNA. The method includes contacting the 
compound to be evaluated with an E. cloacae nucleic acid and determining if the 
compound binds or otherwise interacts with the E. cloacae nucleic acid. Compounds 

10 which bind E. cloacae are candidates as modultors, including activators and inhibitors, 
of the bacterial life cycle. These assays can be performed in vitro or in vivo. 

A particularly preferred embodiment of the invention is directed to a method of 
screening test compounds for anti-bacterial activity, which method comprises: 
selecting as a target a bacterial specific sequence, which sequence is essential to the 

15 viability of a bacterial species; contacting a test compound with said target sequence; 
and selecting those test compounds which bind to said target sequence as potential 
anti-bacterial candidates. In one embodiment, the target sequence selected is specific 
to a single species, or even a single strain, such as, for example, the strain E. cloacae 
15842. In a second embodiment, the target sequence is common to at least two 

20 species of bacteria. In a third embodiment, the target sequence is common to a family 
of bacteria. The target sequence may be a nucleic acid sequence or a polypeptide 
sequence. Methods employing sequences common to more than one species of 
microorganism may be used to screen candidates for broad spectrum anti-bacterial 
activity. 

25 The invention also provides methods for preventing or treating disease caused 

by certain bacteria, including E. cloacae, which are carried out by administering to an 
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animal in need of such treatment, in particular a warm-blooded vertebrate, including 
but not limited to birds and mammals, a compound that specifically inhibits or 
interferes with the function of a bacterial polypeptide or nucleic acid. In a particularly 
preferred embodiment, the mammal to be treated is human. 

5 

DETAILED DESCRIPTION OF THE INVENTION 

The sequences of the present invention include the specific nucleic acid and 
amino acid sequences set forth in the Sequence Listing that forms a part of the present 
specification, and which are designated SEQ ID NO:l - SEQ ID NO: 11324. Use of 
the terms "SEQ ID NO: 1 - SEQ ID NO: 5662 ", " SEQ ID NO: 5663 - SEQ ID NO: 
11324, "the sequences depicted in Table 2", etc., is intended, for convenience, to refer 
to each individual SEQ ID NO individually, and is not intended to refer to the genus 
of these sequences unless such reference would be indicated. In other words, it is a 
shorthand for listing all of these sequences individually. The invention encompasses 
each sequence individually, as well as any combination thereof. 

Definitions 

"Nucleic acid" or "polynucleotide" as used herein refers to purine- and 
pyrimidine-containing polymers of any length, either polyribonucleotides or 
20 polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides. This includes 
single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA 
hybrids, as well as "protein nucleic acids" (PNA) formed by conjugating bases to an 
amino acid backbone. This also includes nucleic acids containing modified bases. 
A nucleic acid or polypeptide sequence that is "derived from" a designated 
25 sequence refers to a sequence that corresponds to a region of the designated sequence. 
For nucleic acid sequences, this encompasses sequences that are homologous or 
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complementary to the sequence, as well as "sequence-conservative variants" and 
"function-conservative variants." For polypeptide sequences, this encompasses 
"function-conservative variants." Sequence-conservative variants are those in which a 
change of one or more nucleotides in a given codon position results in no alteration in 
5 the amino acid encoded at that position. Function-conservative variants are those in 
which a given amino acid residue in a polypeptide has been changed without altering 
the overall conformation and function of the native polypeptide, including, but not 
limited to, replacement of an amino acid with one having similar physico-chemical 
properties (such as, for example, acidic, basic, hydrophobic, and the like). "Function- 

10 conservative" variants also include any polypeptides that have the ability to elicit 
antibodies specific to a designated polypeptide. 

An "E. cloacae-derived' 1 nucleic acid or polypeptide sequence may or may not 
be present in other bacterial species, and may or may not be present in all E. cloacae 
strains. This term is intended to refer to the source from which the sequence was 

15 originally isolated. Thus, an E. cloacae-derived polypeptide, as used herein, may be 
used, e.g., as a target to screen for a broad spectrum antibacterial agent, to search for 
homologous proteins in other species of bacteria or in eukaryotic organisms such 
asbacteria humans, etc. 

A purified or isolated polypeptide or a substantially pure preparation of a 

20 polypeptide are used interchangeably herein and, as used herein, mean a polypeptide 
that has been separated from other proteins, lipids, and nucleic acids with which it 
naturally occurs. Preferably, the polypeptide is also separated from substances, e.g., 
antibodies or gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, 
the polypeptide constitutes at least about 10, 20, 50 70, 80 or 95% dry weight of the 

25 purified preparation. Preferably, the preparation contains sufficient polypeptide to 
allow protein sequencing; at least about 1, 10, or preferably 100 mg of polypeptide. 
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A purified preparation of cells refers to, in the case of plant or animal cells, an 
in vitro preparation of cells and not an entire intact plant or animal. In the case of 
cultured cells or microbial cells, it consists of a preparation of at least about 10%, 
more preferably at least about 50%, of the subject cells. 

5 A purified or isolated or a substantially pure nucleic acid, e.g., a substantially 

pure DNA, (are terms used interchangeably herein) is a nucleic acid which is one or 
both of the following: not immediately contiguous with both of the coding sequences 
with which it is immediately contiguous (i.e., one at the 5' end and one at the 3' end) 
in the naturally-occurring genome of the organism from which the nucleic acid is 

10 derived; or which is substantially free of a nucleic acid with which it occurs in the 
organism from which the nucleic acid is derived. The term includes, for example, a 
recombinant DNA which is incorporated into a vector, e.g., into an autonomously 
replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, 
or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment 

15 produced by PCR or restriction endonuclease treatment) independent of other DNA 
sequences. Substantially pure DNA also includes a recombinant DNA which is part 
of a hybrid gene encoding additional E. cloacae DNA sequence. 

A "contig" as used herein is a nucleic acid representing a continuous stretch of 
genomic sequence of an organism. 

20 An "open reading frame", also referred to herein as ORF, is a region of nucleic 

acid which encodes a polypeptide. This region may represent a portion of a coding 
sequence or a total sequence and can be determined from a stop to stop codon or from 
a start to stop codon. 

As used herein, a "coding sequence" is a nucleic acid which is transcribed into 

25 messenger RNA and/or translated into a polypeptide when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are 
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determined by a translation start codon at the five prime terminus and a translation 
stop code at the three prime terminus. A coding sequence can include but is not 
limited to messenger RNA, synthetic DNA, and recombinant nucleic acid sequences. 
A "complement" of a nucleic acid as used herein refers to an anti-parallel or 
5 antisense sequence that participates in Watson-Crick base-pairing with the original 
sequence. 

A "gene product" is a protein or structural RNA which is specifically encoded 
by a gene. 

As used herein, the term "probe" refers to a nucleic acid, peptide or other 

10 chemical entity which specifically binds to a molecule of interest. Probes are often 
associated with or capable of associating with a label. A label is a chemical moiety 
capable of detection. Typical labels comprise dyes, radioisotopes, luminescent and 
chemiluminescent moieties, fluorophores, enzymes, precipitating agents, amplification 
sequences, and the like. Similarly, a nucleic acid, peptide or other chemical entity 

15 which specifically binds to a molecule of interest and immobilizes such molecule is 
referred herein as a "capture ligand". Capture ligands are typically associated with or 
capable of associating with a support such as nitro-cellulose, glass, nylon membranes, 
beads, particles and the like. The specificity of hybridization is dependent on 
conditions such as the base pair composition of the nucleotides, and the temperature 

20 and salt concentration of the reaction. These conditions are readily discernable to one 
of ordinary skill in the art using routine experimentation. 

"Homologous" refers to the sequence similarity or sequence identity between 
two polypeptides or between two nucleic acid molecules. When a position in both of 
the two compared sequences is occupied by the same base or amino acid monomer 

25 subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then 
the molecules are homologous at that position. The percent of homology between two 
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sequences is a function of the number of matching or homologous positions shared by 
the two sequences divided by the number of positions compared x 100. For example, 
if 6 of 10 of the positions in two sequences are matched or homologous then the two 
sequences are 60% homologous. By way of example, the DNA sequences ATTGCC 
5 and TATGGC share 50% homology. Generally, a comparison is made when two 
sequences are aligned to give maximum homology. 

Nucleic acids are hybridizable to each other when at least one strand of a 
nucleic acid can anneal to the other nucleic acid under defined stringency conditions. 
Stringency of hybridization is determined by: (a) the temperature at which 

10 hybridization and/or washing is performed; and (b) the ionic strength and polarity of 
the hybridization and washing solutions. Hybridization requires that the two nucleic 
acids contain complementary sequences; depending on the stringency of hybridization, 
however, mismatches may be tolerated. Typically, hybridization of two sequences at 
high stringency (such as, for example, in a solution of 0.5X SSC, at 65° C) requires 

15 that the sequences be essentially completely homologous. Conditions of intermediate 
stringency (such as, for example, 2X SSC at 65 ° C) and low stringency (such as, for 
example 2X SSC at 55° C) require correspondingly less overall complementarity 
between the hybridizing sequences. (IX SSC is 0.15 M NaCl, 0.015 M Na citrate). 

The terms peptides, proteins, and polypeptides are used interchangeably herein. 

20 As used herein, the term "surface protein" refers to all surface accessible 

proteins, e.g. inner and outer membrane proteins, proteins adhering to the cell wall, 
and secreted proteins. 

A polypeptide has E. cloacae biological activity if it has one, two or preferably 
more of the following properties: (1) if when expressed in the course of an E. 

25 cloacae infection, it can promote, or mediate the attachment of E. cloacae to. a cell; 
(2) it has an enzymatic activity, structural or regulatory function characteristic of an E. 
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cloacae protein; (3) the gene which encodes it can rescue a lethal mutation in an E. 
cloacae gene. A polypeptide has biological activity if it is an antagonist, agonist, or 
super-agonist of a polypeptide having one of the above-listed properties. 

A biologically active fragment or analog is one having an in vivo or in vitro 
5 activity which is characteristic of the E. cloacae polypeptides of the invention 
contained in the Sequence Listing, or of other naturally occurring E. cloacae 
polypeptides, e.g., one or more of the biological activities described herein. 
Especially preferred are fragments which exist in vivo, e.g., fragments which arise 
from post transcriptional processing or which arise from translation of alternatively 
iijj 10 spliced RNA's. Fragments include those expressed in native or endogenous cells as 

ry well as those made in expression systems, e.g., in CHO (Chinese Hamster Ovary) 

; 3 cells. Because peptides such as E. cloacae polypeptides often exhibit a range of 

jU physiological properties and because such properties may be attributable to different 

f > \ 

\X portions of the molecule, a useful E. cloacae fragment or E. cloacae analog is one 

q 15 which exhibits a biological activity in any biological assay for E. cloacae activity. 

The fragment or analog possesses about 10%, preferably about 40%, more preferably 
about 60%, 70%, 80% or 90% or greater of the activity of E. cloacae, in any in vivo 
or in vitro assay. 

Analogs can differ from naturally occurring E. cloacae polypeptides in amino 
20 acid sequence or in ways that do not involve sequence, or both. Non-sequence 
modifications include changes in acetylation, methylation, phosphorylation, 
carboxylation, or glycosylation. Preferred analogs include E. cloacae polypeptides (or 
biologically active fragments thereof) whose sequences differ from the wild-type 
sequence by one or more conservative amino acid substitutions or by one or more 
25 non-conservative amino acid substitutions, deletions, or insertions which do not 
substantially diminish the biological activity of the E. cloacae polypeptide. 



Conservative substitutions typically include the substitution of one amino acid for 
another with similar characteristics, e.g., substitutions within the following groups: 
valine, glycine; glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic 
acid; asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, 
5 tyrosine. Other conservative substitutions can be made in view of the table below. 
TABLE 1 



CONSERVATIVE AMINO ACID REPLACEMENTS 



For Amino Acid 


Code 


Replace with any of 


Alanine 


A 

A 


jj-Aia, vjiy, Deta-/\ia, L-t^ys, u-i^ys 


Arginine 


R 


u-AJg, JLys, u-i-^ys, nomo-Arg, i^-nurnu-/\ig, iviei, 


Asparagine 


IN 


U-/\Sn, / vSp, L/-/\Sp, V_J1U, U-OIU, VJ1I1, LJ-KJlll 


Aspartic Acid 


U 


P\ A cn n Acn Aon Olii P* f^^^^ (~l]ri Pi fTIn 

j_j-/\sp, D-/\sn, /\sn, OIU, L^-vJIU, VJ1I1, LJ-vJLLl 


Cysteine 




u-i^ys, o-ivie-\^ys, iviei, u-iviei, iiir, lj- i in 


Glutamine 


r\ 


P* r^l n Acn Pi Acn CS\w Pi frlii A Qn Pi- A Qn 

u-oin, /\sn, u-/\sn, vjriu, l^-vjiu, /A&p, j_^-/-vc>p 


Glutamic Acid 




P\ n P% Acn Aon Aon Pi Acn fTIn 0-0111 
L/-VJ1U, LJ~/\Sp, r\Sp, /AMI, l^-ZAMl, vjlll, 


ijiycine 


vJ 


Ala D-Ala Pro D-Prn Li-Ala Acn 

/Aid, U /Aid, ITIU, \-J JT1U, rilCl, r^X^Y* 


Isoleucine 


I 


D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met 


Leucine 


L 


D-Leu, Val, D-Val, Leu, D-Leu, Met, D-Met 


Lysine 


K 


D-Lys, Arg, D-Arg, homo-Arg, D-homo-Arg, Met, 


Methionine 


M 


D-Met, S-Me-Cys, He, D-He, Leu, D-Leu, Val, D- 


Phenylalanine 


F 


D-Phe, Tyr, D-Thr, L-Dopa, His, D-His, Tip, D- 


Proline 


P 


D-Pro, L-I-thioazolidine-4-carboxylic acid, D-or L- 


Serine 


S 


D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met, Met(O), 


Threonine 


T 


D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met, Met(O), 


Tyrosine 


Y 


D-Tyr, Phe, D-Phe, L-Dopa, His, D-His 


Valine 


V 


D-Val, Leu, D-Leu, He, D-Ile, Met, D-Met 
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Other analogs within the invention are those with modifications which increase 
peptide stability; such analogs may contain, for example, one or more non-peptide 
bonds (which replace the peptide bonds) in the peptide sequence. Also included are: 
analogs that include residues other than naturally occurring L-amino acids, e.g., D- 
5 amino acids or non-naturally occurring or synthetic amino acids, e.g., p or y amino 
acids; and cyclic analogs. 

As used herein, the term "fragment", as applied to an E. cloacae analog, will 
ordinarily be at least about 20 residues, more typically at least about 40 residues, 
preferably at least about 60 residues in length. Fragments of E. cloacae polypeptides 
10 can be generated by methods known to those skilled in the art. The ability of an 

Enterobacter fragment to exhibit a biological activity of E. cloacae polypeptide can be 
assessed by methods known to those skilled in the art as described herein. Also 
included are E. cloacae polypeptides containing residues that are not required for 
biological activity of the peptide or that result from alternative mRNA splicing or 
15 alternative protein processing events. 

An "immunogenic component" as used herein is a moiety, such as an E. 
cloacae polypeptide, analog or fragment thereof, that is capable of eliciting a humoral 
and/or cellular immune response in a host animal. 

An "antigenic component" as used herein is a moiety, such as an E, cloacae 
20 polypeptide, analog or fragment thereof, that is capable of binding to a specific 

antibody with sufficiently high affinity to form a detectable antigen-antibody complex. 

The term "antibody" as used herein is intended to include fragments thereof 
which are specifically reactive with E. cloacae polypeptides. 

As used herein, the term "cell-specific promoter" means a DNA sequence that 
25 serves as a promoter, i.e., regulates expression of a selected DNA sequence operably 
linked to the promoter, and which effects expression of the selected DNA sequence in 
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specific cells of a tissue. The term also covers so-called "leaky" promoters, which 
regulate expression of a selected DNA primarily in one tissue, but cause expression in 
other tissues as well. 

Misexpression, as used herein, refers to a non-wild type pattern of gene 
5 expression. It includes: expression at non-wild type levels, i.e., over or under 

expression; a pattern of expression that differs from wild type in terms of the time or 
stage at which the gene is expressed, e.g., increased or decreased expression (as 
compared with wild type) at a predetermined developmental period or stage; a pattern 
of expression that differs from wild type in terms of increased expression (as 

10 compared with wild type) in a predetermined cell type or tissue type; a pattern of 
expression that differs from wild type in terms of the splicing size, amino acid 
sequence, post-translational modification, or biological activity of the expressed 
polypeptide; a pattern of expression that differs from wild type in terms of the effect 
of an environmental stimulus or extracellular stimulus on expression of the gene, e.g., 

15 a pattern of increased or decreased expression (as compared with wild type) in the 
presence of an increase or decrease in the strength of the stimulus. 

As used herein, "host cells" and other such terms denoting microorganisms or 
higher eukaryotic cell lines cultured as unicellular entities refers to cells which can 
become or have been used as recipients for a recombinant vector or other transfer 

20 DNA, and include the progeny of the original cell which has been transfected. It is 
understood by individuals skilled in the art that the progeny of a single parental cell 
may not necessarily be completely identical in genomic or total DNA compliment to 
the original parent, due to accident or deliberate mutation. 

As used herein, the term "control sequence" refers to a nucleic acid having a 

25 base sequence which is recognized by the host organism to effect the expression of 
encoded sequences to which they are ligated. The nature of such control sequences 
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differs depending upon the host organism; in prokaryotes, such control sequences 
generally include a promoter, ribosomal binding site, terminators, and in some cases 
operators; in eukaryotes, generally such control sequences include promoters, 
terminators and in some instances, enhancers. The term control sequence is intended 
5 to include at a minimum, all components whose presence is necessary for expression, 
and may also include additional components whose presence is advantageous, for 
example, leader sequences. 

As used herein, the term "operably linked" refers to sequences joined or ligated 
to function in their intended manner. For example, a control sequence is operably 
10 linked to coding sequence by ligation in such a way that expression of the coding 
sequence is achieved under conditions compatible with the control sequence and host 
cell. 

The "metabolism" of a substance, as used herein, means any aspect of the 
expression, function, action, or regulation of the substance. The metabolism of a 

15 substance includes modifications, e.g., covalent or non-covalent modifications of the 
substance. The metabolism of a substance includes modifications, e.g., covalent or 
non-covalent modification, the substance induces in other substances. The metabolism 
of a substance also includes changes in the distribution of the substance. The 
metabolism of a substance includes changes the substance induces in the distribution 

20 of other substances. 

A "sample" as used herein refers to a biological sample, such as, for example, 
tissue or fluid isloated from an individual (including without limitation plasma, serum, 
cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell 
culture constituents, as well as samples from the environment. 

25 Technical and scientific terms used herein have the meanings commonly 

understood by one of ordinary skill in the art to which the present invention pertains, 
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unless otherwise defined. Reference is made herein to various methodologies known 
to those of skill in the art. Publications and other materials setting forth such known 
methodologies to which reference is made are incorporated herein by reference in their 
entireties as though set forth in full. The practice of the invention will employ, unless 
5 otherwise indicated, conventional techniques of chemistry, molecular biology, 

microbiology, recombinant DNA, and immunology, which are within the skill of the 
art. Such techniques are explained fully in the literature. See e.g., Sambrook, Fritsch, 
and Maniatis, Molecular Cloning; Laboratory Manual 2nd ed. (1989); DNA Cloning, 
Volumes I and II (D.N Glover ed. 1985); Oligonucleotide Synthesis (MJ. Gait ed, 

10 1984); Nucleic Acid Hybridization (B.D. Haines & S.J. Higgins eds. 1984); the series, 
Methods in Enzymoloqy (Academic Press, Inc.), particularly Vol. 154 and Vol. 155 
(Wu and Grossman, eds.); PCR-A Practical Approach (McPherson, Quirke, and 
Taylor, eds., 1991); Immunology, 2d Edition, 1989, Roitt et al, C.V. Mosby 
Company, and New York; Advanced Immunology, 2d Edition, 1991, Male et aL, 

15 Grower Medical Publishing, New York.; DNA Cloning: A Practical Approach, 

Volumes I and II, 1985 (D.N. Glover ed.); Oligonucleotide Synthesis, 1984, (M.L. 
Gait ed); Transcription and Translation, 1984 (Hames and Higgins eds.); Animal Cell 
Culture, 1986 (R.I. Freshney ed.); Immobilized Cells and Enzymes, 1986 (IRL Press); 
Perbal, 1984, A Practical Guide to Molecular Cloning; Gene Transfer Vectors for 

20 Mammalian Cells, 1987 (J. H. Miller and M. P. Calos eds., Cold Spring Harbor 

Laboratory); Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, 
Academic Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne 
Harwood Peruski, The Internet and the New Biology: Tools for Genomic and 
Molecular Research, American Society for Microbiology, Washington, D.C. (1997). 

25 Any suitable materials and/or methods known to those of skill can be utilized 

in carrying out the present invention; however, preferred materials and/or methods are 



described. Materials, reagents and the like to which reference is made in the 
following description and examples are obtainable from commercial sources, unless 
otherwise noted. 

5 E. cloacae Genomic Sequence 

This invention provides nucleotide sequences of the genome of E. cloacae 
which thus comprises a DNA sequence library of E. cloacae genomic DNA. The 
detailed description that follows provides nucleotide sequences of E. cloacae, and also 

10 describes how the sequences were obtained and how ORFs and protein-coding 

sequences were identified. Also described are compositions and methods of using the 
disclosed E. cloacae sequences in methods including diagnostic and therapeutic 
applications. Furthermore, the library can be used as a database for identification and 
comparison of medically important sequences in this and other strains of E, cloacae. 

15 To determine the genomic sequence of E. cloacae, DNA from strain 15842 of 

E. cloacae was isolated after Zymolyase digestion, sodium dodecyl sulfate lysis, 
potassium acetate precipitation, phenol:chloroform extractionand ethanol precipitation 
(Soil, D.R., T. Srikantha and S.R. Lockhart: Characterizing Developmental^ 
Regulated Genes in E. cloacae. In Microbial Genome Methods. K.W. Adolph, editor. 

20 CRC Press. New York, p 17-37.). DNA was sheared hydrodynamically using an 
HPLC (Oefner, et. al., 1996) to an insert size of 2000-3000 bp. After size 
fractionation by gel electrophoresis the fragments were blunt-ended, ligated to adapter 
oligonucleotides and cloned into the pGTC (Thomann) vector to construct a "shotgun" 
subclone library. 
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DNA sequencing was achieved using established ABI sequencing methods on 
ABI377 automated DNA sequencers. The cloning and sequencing procedures are 
described in more detail in the Exemplification. 

Individual sequence reads were assembled using PHRAP (P. Green, Abstracts 

5 of DOE Human Genome Program Contractor-Grantee Workshop V, Jan. 1996, p. 157). 
The average contig length was about 3-4 kb. 

All subsequent steps were based on sequencing by ABI377 automated DNA 
sequencing methods. The cloning and sequencing procedures are described in more 
detail in the Exemplification. 

10 A variety of approaches may be used to order the contigs so as to obtain a 

continuous sequence representing the entire E. cloacae genome. Synthetic 
oligonucleotides are designed that are complementary to sequences at the end of each 
contig. These oligonucleotides may be hybridized to libaries of E. cloacae genomic 
DNA in, for example, lambda phage vectors or plasmid vectors to identify clones that 

15 contain sequences corresponding to the junctional regions between individual contigs. 
Such clones are then used to isolate template DNA and the same oligonucleotides are 
used as primers in polymerase chain reaction (PCR) to amplify junctional fragments, 
the nucleotide sequence of which is then determined. 

The E. cloacae sequences were analyzed for the presence of open reading 

20 frames (ORFs) comprising at least 180 nucleotides. As a result of the analysis of 

ORFs based on stop-to-stop codon reads, it should be understood that these ORFs may 
not correspond to the ORF of a naturally-occurring E. cloacae polypeptide. These 
ORFs may contain start codons .which indicate the initiation of protein synthesis of a 
naturally-occurring E. cloacae polypeptide. Such start codons within the ORFs 

25 provided herein were identified by those of ordinary skill in the relevant art, and the 
resulting ORF and the encoded E. cloacae polypeptide is within the scope of this 
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invention. For example, within the ORFs a codon such as AUG or GUG (encoding 
methionine or valine) which is part of the initiation signal for protein synthesis were 
identified and the portion of an ORF to corresponding to a naturally-occurring E. 
cloacae polypeptide was recognized. The predicted coding regions were defined by 
5 evaluating the coding potential of such sequences with the program GENEMARK™ 
(Borodovsky and Mclninch, 1993, Comp. . 17:123). 

Each predicted ORF amino acid sequence was compared with all sequences 
found in current GENBANK, SWISS-PROT, and PIR databases using the BLAST 
algorithm. BLAST identifies local alignments occurring by chance between the ORF 
10 sequence and the sequence in the databank (Altschal et al., 1990, L Mol. Biol. 

215:403-410). Homologous ORFs (probabilities less than 10" 5 by chance) andORF's 
that are probably non-homologous (probabilities greater than 10" 5 by chance) but have 
good codon usage were identified. Both homologous, sequences and non-homologous 
i f sequences with good codon usage, are likely to encode proteins and are encompassed 

15 by the invention. 

E. cloacae Nucleic Acids 



The present invention provides a library of E. cloacae-derived nucleic acid 
20 sequences. The libraries provide probes, primers, and markers which are used as 

markers in epidemiological studies. The present invention also provides a library of E. 
cloacae-derived nucleic acid sequences which comprise or encode targets for 
therapeutic drugs. 

The nucleic acids of this invention may be obtained directly from the DNA of 
25 the above referenced E. cloacae strain by using the polymerase chain reaction (PCR). 
See "PCR, A Practical Approach" (McPherson, Quirke, and Taylor, eds., IRL Press, 
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Oxford, UK, 1991) for details about the PCR. High fidelity PCRis used to ensure a 
faithful DNA copy prior to expression. In addition, the authenticity of amplified 
products is verified by conventional sequencing methods. Clones carrying the desired 
sequences described in this invention may also be obtained by screening the libraries 
5 by means of the PCR or by hybridization of synthetic oligonucleotide probes to filter 
lifts of the library colonies or plaques as known in the art (see, e.g., Sambrook et al., 
Molecular Cloning, A Laboratory Manual 2nd edition, 1989, Cold Spring Harbor 
Press, NY). 

It is also possible to obtain nucleic acids encoding E. cloacae polypeptides 

10 from a cDNA library in accordance with protocols herein described. A cDNA 

encoding an E. cloacae polypeptide can be obtained by isolating total raRNA from an 
appropriate strain. Double stranded cDNAs can then be prepared from the total 
mRNA. Subsequently, the cDNAs can be inserted into a suitable plasmid or viral 
(e.g., bacteriophage) vector using any one of a number of known techniques. Genes 

15 encoding E. cloacae polypeptides can also be cloned using established polymerase 
chain reaction techniques in accordance with the nucleotide sequence information 
provided by the invention. The nucleic acids of the invention can be DNA or RNA. 
Preferred nucleic acids of the invention are contained in the Sequence Listing. 

The nucleic acids of the invention can also be chemically synthesized using 

20 standard techniques. Various methods of chemically synthesizing 

polydeoxynucleotides are known, including solid-phase synthesis which, like peptide 
synthesis, has been fully automated in commercially available DNA synthesizers (See 
e.g., Itakura et al. U.S. Patent No. 4,598,049; Caruthers et al. U.S. Patent No. 
4,458,066; and Itakura U.S. Patent Nos. 4,401,796 and 4,373,071, incorporated by 

25 reference herein). 
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In another example, DNA can be chemically synthesized using, e.g., the 
phosphoramidite solid support method of Matteucci et al„ 1981, J. Am. Chem. Soc. 
103:3185, the method of Yoo et a/., 1989, J. Biol Chem. 764:17078, or other well 
known methods. This can be done by sequentially linking a series of oligonucleotide 
5 cassettes comprising pairs of synthetic oligonucleotides, as described below. 

Nucleic acids isolated or synthesized in accordance with features of the present 
invention are useful, by way of example, without limitation, as probes, primers, 
capture ligands, antisense genes and for developing expression systems for the 
synthesis of proteins and peptides corresponding to such sequences. As probes, 
10 primers, capture ligands and antisense agents, the nucleic acid normally consists of all 
or part (approximately twenty or more nucleotides for specificity as well as the ability 
to form stable hybridization products) of the nucleic acids of the invention contained 
in the Sequence Listing. These uses are described in further detail below. 

15 Probes 

A nucleic acid isolated or synthesized in accordance with the sequence of the 
invention contained in the Sequence Listing can be used as a probe to specifically 
detect E, cloacae. With the sequence information set forth in the present application, 
sequences of twenty or more nucleotides are identified which provide the desired 
20 inclusivity and exclusivity with respect to E. cloacae, and extraneous nucleic acids 
likely to be encountered during hybridization conditions. More preferably, the 
sequence will comprise at least about twenty to thirty nucleotides to convey stability 
to the hybridization product formed between the probe and the intended target 
molecules. 

25 Sequences larger than 1000 nucleotides in length are difficult to synthesize but 

can be generated by recombinant DNA techniques. Individuals skilled in the art will 
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readily recognize that the nucleic acids, for use as probes, can be provided with a 
label to facilitate detection of a hybridization product. 

Nucleic acid isolated and synthesized in accordance with the sequence of the 
invention contained in the Sequence Listing can also be useful as probes to detect 
5 homologous regions (especially homologous genes) of other Enterobacter species 
using appropriate stringency hybridization conditions as described herein. 

Capture Ligand 

For use as a capture ligand, the nucleic acid selected in the manner described 
above with respect to probes, can be readily associated with a support. The manner in 
which nucleic acid is associated with supports is well known. Nucleic acid having 
twenty or more nucleotides in a sequence of the invention contained in the Sequence 
Listing have utility to separate E. cloacae nucleic acid from one strain from the 
nucleic acid of other another strain as well as from other organisms. Nucleic acid 
having twenty or more nucleotides in a sequence of the invention contained in the 
Sequence Listing can also have utility to separate other Enterobacter species from 
each other and from other organisms. Preferably, the sequence will comprise at least 
about twenty nucleotides to convey stability to the hybridization product formed 
between the probe and the intended target molecules. Sequences larger than 1000 
nucleotides in length are difficult to synthesize but can be generated by recombinant 
DNA techniques. 

Primers 

Nucleic acid isolated or synthesized in accordance with the sequences 
25 described herein have utility as primers for the amplification of E. cloacae nucleic 
acid. These nucleic acids may also have utility as primers for the amplification of 
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nucleic acids in other Enterobacter species. With respect to polymerase chain 
reaction (PCR) techniques, nucleic acid sequences of > 10-15 nucleotides of the 
invention contained in the Sequence Listing have utility in conjunction with suitable 
enzymes and reagents to create copies of E. cloacae nucleic acid. More preferably, 
5 the sequence will comprise twenty or more nucleotides to convey stability to the 
hybridization product formed between the primer and the intended target molecules. 
Binding conditions of primers greater than 100 nucleotides are more difficult to 
control to obtain specificity. High fidelity PCR can be used to ensure a faithful DNA 

^ copy prior to expression. In addition, amplified products can be checked by 

vT\ 10 conventional sequencing methods. 

| y The copies can be used in diagnostic assays to detect specific sequences, 

; 3 including genes from E. cloacae and/or other Enterobacter species. The copies can 

;^ also be incorporated into cloning and expression vectors to generate polypeptides 

\X corresponding to the nucleic acid synthesized by PCR, as is described in greater detail 

k.i 15 herein. 

The nucleic acids of the present invention find use as templates for the 
recombinant production of E. cloacae-derived peptides or polypeptides 



Antisense 

20 Nucleic acid or nucleic acid-hybridizing derivatives isolated or synthesized in 

accordance with the sequences described herein have utility as antisense agents to 
prevent the expression of E. cloacae genes. These sequences also have utility as 
antisense agents to prevent expression of genes of other Enterobacter species. 

In one embodiment, nucleic acid or derivatives corresponding to E. cloacae 
25 nucleic acids is loaded into a suitable carrier such as a liposome or bacteriophage for 
introduction into bacterial cells. For example, a nucleic acid having twenty or more 

36 



nucleotides is capable of binding to bacteria nucleic acid or bacteria messenger RNA. 
Preferably, the antisense nucleic acid is comprised of 20 or more nucleotides to 
provide necessary stability of a hybridization product of non-naturally occurring 
nucleic acid and bacterial nucleic acid and/or bacterial messenger RNA. Nucleic acid 
5 having a sequence greater than 1000 nucleotides in length is difficult to synthesize but 
can be generated by recombinant DNA techniques. Methods for loading antisense 
nucleic acid in liposomes is known in the art as exemplified by U.S. Patent 4,241,046 
issued December 23, 1980 to Papahadjopoulos et al. 

The present invention encompasses isolated polypeptides and nucleic acids 

Hiss? 

j-1 10 derived from E. cloacae that are useful as reagents for diagnosis of bacterial infection, 

i — 

I ! y components of effective anti-bacterial vaccines, and/or as targets for anti-bacterial 

}=3 drugs, including anti-Zs. cloacae drugs. 

j : T Expression of E. cloacae Nucleic Acids 

O 15 

. 

Table 2, which is appended herewith and which forms part of the present 
specification, provides a list of open reading frames (ORFs) in both strands and a 
putative identification of the particular function of a polypeptide which is encoded by 
each ORF, based on the homology match (determined by the BLAST algorithm) of 

20 the predicted polypeptide with known proteins encoded by ORFs in other organisms. 
An ORF is a region of nucleic acid which encodes a polypeptide. This region may 
represent a portion of a coding sequence or a total sequence and was determined from 
stop to stop codons. The first column contains a designation for the contig from 
which each ORF was identified (numbered arbitrarily). Each contig represents a 

25 continuous stretch of the genomic sequence of the organism. The second column lists 
the ORF designation. The third and fourth columns list the SEQ ID numbers for the 
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nucleic acid and amino acid sequences corresponding to each ORF, respectively. The 
fifth and sixth columns list the length of the nucleic acid ORF and the length of the 
amino acid ORF, respectively. The nucleotide sequence corresponding to each ORF 
begins at the first nucleotide immediately following a stop codon and ends at the 
5 nucleotide immediately preceding the next downstream stop codon in the same reading 
frame. It will be recognized by one skilled in the art that the natural translation 
initiation sites will correspond to ATG, GTG, or TTG codons located within the 
ORFs. The natural initiation sites depend not only on the sequence of a start codon 
but also on the context of the DNA sequence adjacent to the start codon. Usually, a 

10 recognizable ribosome binding site is found within 20 nucleotides upstream from the 
initiation codon. In some cases where genes are translationally coupled and 
coordinately expressed together in "operons", ribosome binding sites are not present, 
but the initiation codon of a downstream gene may occur very close to, or overlap, the 
stop codon of the an upstream gene in the same operon. The correct start codons can 

15 be generally identified without undue experimentation because only a few codons need 
be tested. It is recognized that the translational machinery in bacteria initiates all 
polypeptide chains with the amino acid methionine, regardless of the sequence of the 
start codon. In some cases, polypeptides are post-translationally modified, resulting in 
an N-terminal amino acid other than methionine in vivo. The seventh and eighth 

20 columns provide metrics for assessing the likelihood of the homology match 
(determined by the BLASTP2 algorithm), as is known in the art, to the genes 
indicated in the eleventh column when the designated ORF was compared against a 
non-redundant comprehensive protein database. Specifically, the seventh column 
represents the "Blast Score" for the match (a higher score is a better match), and the 

25 eighth column represents the "P-value" for the match (the probability that such a 
match can have occurred by chance; the lower the value, the more likely the match is 
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valid). If a BLASTP2 score of less than 46 was obtained, no value is reported in the 
table the "P-value". Column nine provides the name of the organism that was 
identified as having the closest homology match. The tenth column provides, where 
available, either a public database accession number or our own sequence name. The 
5 eleventh column provides, where available, the Swissprot accession number 
(SP),(SP), the locus name (LN), the Organism (OR), Source of variant (SR), E.C. 
number (EC), the gene name (GN), the product name (PN), the Function Description 
(FN), Left End (LE), Right End (RE), Coding Direction (DI), and the description (DE) 
or notes (NT) for each ORF. Information that is not preceded by a code designation 

10 in the eleventh column represents a description of the ORF. This information allows 
one of ordinary skill in the art to determine a potential use for each identified coding 
sequence and, as a result, allows to use the polypeptides of the present invention for 
commercial and industrial purposes. 

Using the information provided in SEQ ID NO: 1 - SEQ ID NO: 5662, SEQ 

15 ID NO: 5663 - SEQ ID NO: 11324 and in Table 2 together with routine cloning and 
sequencing methods, one of ordinary skill in the art will be able to clone and 
sequence all the nucleic acid fragments of interest including open reading frames 
(ORFs) encoding a large variety of proteins of E. cloacae. 

Nucleic acid isolated or synthesized in accordance with the sequences 

20/described herein have utility to generate polypeptide^-The nucleic acid of the 
invention exemplified in SEQ ID NO: 1 - SEQ ID NO: 5662 and in Table 2 or 
fragments of said nucleic acid encoding active portions of E. cloacae polypeptides can 
be cloned into suitable vectors or used to isolate nucleic acid. The isolated nucleic 
acid is combined with suitable DNA linkers and cloned into a suitable vector. 

25 The function of a specific gene or operon can be ascertained by expression in a 

bacterial strain under conditions where the activity of the gene product(s) specified by 
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the gene or operon in question can be specifically measured. Alternatively, a gene 
product may be produced in large quantities in an expressing strain for use as an 
antigen, an industrial reagent, for structural studies, etc. This expression can be 
accomplished in a mutant strain which lacks the activity of the gene to be tested, or in 
5 a strain that does not produce the same gene product(s). This includes, but is not 
limited to, Eucaryotic species such as the yeast Saccharomyces cerevisiae, 
Methanobacterium strains or other Archaea, and Eubacteria such as E. coli, B. 
Subtilis, S. Aureus, S. Pneumonia or Pseudomonas putida. In some cases the 
^ expression host will utilize the natural E. cloacae promoter whereas in others, it will 

i-jj 10 be necessary to drive the gene with a promoter sequence derived from the expressing 

fy organism (e.g., an E. coli beta-galactosidase promoter for expression in E. coli). 

; 3 To express a gene product using the natural E. cloacae promoter, a procedure 

L ? such as the following can be used. A restriction fragment containing the gene of 

jl interest, together with its associated natural promoter element and regulatory 

s. Li 

15 sequences (identified using the DNA sequence data) is cloned into an appropriate 
recombinant plasmid containing an origin of replication that functions in the host 
organism and an appropriate selectable marker. This can be accomplished by a 
number of procedures known to those skilled in the art. It is most preferably done by 
cutting the plasmid and the fragment to be cloned with the same restriction enzyme to 

20 produce compatible ends that can be ligated to join the two pieces together. The 
recombinant plasmid is introduced into the host organism by, for example, 
electroporation and cells containing the recombinant plasmid are identified by 
selection for the marker on the plasmid. Expression of the desired gene product is 
detected using an assay specific for that gene product. 

25 In the case of a gene that requires a different promoter, the body of the gene 

(coding sequence) is specifically excised and cloned into an appropriate expression 



plasmid. This subcloning can be done by several methods, but is most easily 
accomplished by PCR amplification of a specific fragment and ligation into an 
expression plasmid after treating the PCR product with a restriction enzyme or 
exonuclease to create suitable ends for cloning. 
5 A suitable host cell for expression of a gene can be any procaryotic or 

eucaryotic cell. Suitable methods for transforming host cells can be found in 
Sambrook et al. (Molecular Cloning: A Laboratory Manual , 2nd Edition, Cold Spring 
Harbor Laboratory Press (1989)), and other laboratory textbooks. 

For example, a host cell transfected with a nucleic acid vector directing 

1-3 

!;r! 10 expression of a nucleotide sequence encoding an E. cloacae polypeptide can be 

: : i 

i y cultured under appropriate conditions to allow expression of the polypeptide to occur, 

w Suitable media for cell culture are well known in the art. Polypeptides of the 

; ^ invention can be isolated from cell culture medium, host cells, or both using 

; J techniques known in the art for purifying proteins including ion-exchange 

: 5 15 chromatography, gel filtration chromatography, ultrafiltration, electrophoresis, and 

y 

immunoaffinity purification with antibodies specific for such polypeptides. 
Additionally, in many situations, polypeptides can be produced by chemical cleavage 
of a native protein (e.g., tryptic digestion) and the cleavage products can then be 
purified by standard techniques. 

20 In the case of membrane bound proteins, these can be isolated from a host cell 

by contacting a membrane-associated protein fraction with a detergent forming a 
solubilized complex, where the membrane-associated protein is no longer entirely 
embedded in the membrane fraction and is solubilized at least to an extent which 
allows it to be chromatographically isolated from the membrane fraction. 

25 Chromatographic techniques which can be used in the final purification step are 
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known in the art and include hydrophobic interaction, lectin affinity, ion exchange, 

dye affinity and immunoaffinity. 

One strategy to maximize recombinant E. cloacae peptide expression in E. coli 

is to express the protein in a host bacteria with an impaired capacity to proteolytically 
5 cleave the recombinant protein (Gottesman, S., Gene Expression Technology: Methods 

in Enzvmology 185 , Academic Press, San Diego, California (1990) 119-128). 

Another strategy would be to alter the nucleic acid encoding an E. cloacae peptide to 

be inserted into an expression vector so that the individual codons for each amino acid 

would be those preferentially utilized in highly expressed E. coli proteins (Wada et al., 
10 (1992) Nuc. Acids Res. 20:2111-2118). Such alteration of nucleic acids of the 

invention can be carried out by standard DNA synthesis techniques. 

The nucleic acids of the invention can also be chemically synthesized using 

standard techniques. Various methods of chemically synthesizing 

polydeoxy nucleotides are known, including solid-phase synthesis which, like peptide 
15 synthesis, has been fully automated in commercially available DNA synthesizers (See, 

e.g., Itakura et al. U.S. Patent No. 4,598,049; Caruthers et al. U.S. Patent No. 

4,458,066; and Itakura U.S. Patent Nos. 4,401,796 and 4,373,071, incorporated by 

reference herein). 

The present invention provides a library of E. cloacae-derived nucleic acid 
20 sequences. The libraries provide probes, primers, and markers which can be used as 
markers in epidemiological studies. The present invention also provides a library of E. 
cloacae-derived nucleic acid sequences which comprise or encode targets for 
therapeutic drugs. 

Nucleic acids comprising any of the sequences disclosed herein or sub- 
25 sequences thereof can be prepared by standard methods using the nucleic acid 

sequence information provided in SEQ ID NO: 1 - SEQ ID NO: 5662. For example, 
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DNA can be chemically synthesized using, e.g., the phosphoramidite solid support 
method of Matteucci et aL, 1981, J. Am. Cherru Soc. 103:3185, the method of Yoo et 
ai, 1989, J. Biol Chem. 764:17078, or other well known methods. This can be done 
by sequentially linking a series of oligonucleotide cassettes comprising pairs of 

5 synthetic oligonucleotides, as described below. 

Of course, due to the degeneracy of the genetic code, many different 
nucleotide sequences can encode polypeptides having the amino acid sequences 
defined by SEQ ID NO: 5663 - SEQ ID NO: 11324 or sub-sequences thereof. The 
codons can be selected for optimal expression in prokaryotic or eukaryotic systems. 

10 Such degenerate variants are also encompassed by this invention. 

Insertion of nucleic acids (typically DNAs) encoding the polypeptides of the 
invention into a vector is easily accomplished when the termini of both the DNAs and 
the vector comprise compatible restriction sites. If this cannot be done, it may be 
necessary to modify the termini of the DNAs and/or vector by digesting back single- 

15 stranded DNA overhangs generated by restriction endonuclease cleavage to produce 
blunt ends, or to achieve the same result by filling in the single-stranded termini with 
an appropriate DNA polymerase. 

Alternatively, any site desired may be produced, e.g., by ligating nucleotide 
sequences (linkers) onto the termini. Such linkers may comprise specific 

20 oligonucleotide sequences that define desired restriction sites. Restriction sites can 
also be generated by the use of the polymerase chain reaction (PCR). See, e.g., Saiki 
et al, 1988, Science 239:48. The cleaved vector and the DNA fragments may also be 
modified if required by homopolymeric tailing. 

The nucleic acids of the invention may be isolated directly from cells. 

25 Alternatively, the polymerase chain reaction (PCR) method can be used to produce the 
nucleic acids of the invention, using either chemically synthesized strands or genomic 
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material as templates. Primers used for PCR can be synthesized using the sequence 
information provided herein and can further be designed to introduce appropriate new 
restriction sites, if desirable, to facilitate incorporation into a given vector for 
recombinant expression. 
5 The nucleic acids of the present invention may be flanked by natural E. 

cloacae regulatory sequences, or may be associated with heterologous sequences, 
including promoters, enhancers, response elements, signal sequences, polyadenylation 
sequences, introns, 5 - and 3 - noncoding regions, and the like. The nucleic acids may 
also be modified by many means known in the art. Non-limiting examples of such 

10 modifications include methylation, "caps", substitution of one or more of the naturally 
occurring nucleotides with an analog, internucleotide modifications such as, for 
example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, 
phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., 
phosphorothioates, phosphorodithioates, etc.). Nucleic acids may contain one or more 

15 additional covalently linked moieties, such as, for example, proteins (e.g., nucleases, 
toxins, antibodies, signal peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, 
psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, oxidative metals, etc.), 
and alkylators. PNAs are also included. The nucleic acid may be derivatized by 
formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. 

20 Furthermore, the nucleic acid sequences of the present invention may also be modified 
with a label capable of providing a detectable signal, either directly or indirectly. 
Exemplary labels include radioisotopes, fluorescent molecules, biotin, and the like. 

The invention also provides nucleic acid vectors comprising the disclosed E. 
cloacae-derived sequences or derivatives or fragments thereof. A large number of 

25 vectors, including plasmid and bacterial vectors, have been described for replication 
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and/or expression in a variety of eukaryotic and prokaryotic hosts, and may be used 
for cloning or protein expression. 

The encoded E. cloacae polypeptides may be expressed by using many known 
vectors, such as pUC plasmids, pET plasmids (Novagen, Inc., Madison, WI), or 
5 pRSET or pREP (Invitrogen, San Diego, CA), and many appropriate host cells, using 
methods disclosed or cited herein or otherwise known to those skilled in the relevant 
art. The particular choice of vector/host is not critical to the practice of the invention. 

Recombinant cloning vectors will often include one or more replication 
systems for cloning or expression, one or more markers for selection in the host, e.g. 

i =i 10 antibiotic resistance, and one or more expression cassettes. The inserted E. cloacae 

i pi 

jij] coding sequences may be synthesized by standard methods, isolated from natural 

?,3 sources, or prepared as hybrids, etc. Ligation of the E. cloacae coding sequences to 

transcriptional regulatory elements and/or to other amino acid coding sequences may 
j be achieved by known methods. Suitable host cells may be 

15 transformed/transfected/infected as appropriate by any suitable method including 
electroporation, CaCl 2 mediated DNA uptake, bacterial infection, microinjection, 
microprojectile, or other established methods. 

Appropriate host cells include bacteria, archebacteria, fungi, especially yeast, 
and plant and animal cells, especially mammalian cells. Of particular interest are E. 
20 cloacae, E. coli, B. Subtilis, Saccharomyces cerevisiae, Saccharomyces carlsbergensis, 
Schizosaccharomyces pombi, SF9 cells, C129 cells, 293 cells, Neurospora, and CHO 
cells, COS cells, HeLa cells, and immortalized mammalian myeloid and lymphoid cell 
lines. Preferred replication systems include M13, ColEl, SV40, baculovirus, lambda, 
adenovirus, and the like. A large number of transcription initiation and termination 
25 regulatory regions have been isolated and shown to be effective in the transcription 
and translation of heterologous proteins in the various hosts. Examples of these 



regions, methods of isolation, manner of manipulation, etc. are known in the art. 
Under appropriate expression conditions, host cells can be used as a source of 
recombinantly produced E. cloacae-derived peptides and polypeptides. 

Advantageously, vectors may also include a transcription regulatory element 

5 (i.e., a promoter) operably linked to the E. cloacae portion. The promoter may 
optionally contain operator portions and/or ribosome binding sites. Non-limiting 
examples of bacterial promoters compatible with E. coli include: b-lactamase 
(penicillinase) promoter; lactose promoter; tryptophan (trp) promoter; araBAD 
(arabinose) operon promoter; lambda-derived P t promoter and N gene ribosome 

10 binding site; and the hybrid tac promoter derived from sequences of the trp and lac 
UV5 promoters. Non-limiting examples of yeast promoters include 3- 
phosphoglycerate kinase promoter, glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH) promoter, galactokinase (GAL1) promoter, galactoepimerase promoter, and 
alcohol dehydrogenase (ADH) promoter. Suitable promoters for mammalian cells 

15 include without limitation viral promoters such as that from Simian Virus 40 (SV40), 
Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papilloma virus (BPV). 
Mammalian cells may also require terminator sequences, polyA addition sequences 
and enhancer sequences to increase expression. Sequences which cause amplification 
of the gene may also be desirable. Furthermore, sequences that facilitate secretion of 

20 the recombinant product from cells, including, but not limited to, bacteria, yeast, and 
animal cells, such as secretory signal sequences and/or prohormone pro region 
sequences, may also be included. These sequences are well described in the art. 

Nucleic acids encoding wild-type or variant E. cloacae-derived polypeptides 
may also be introduced into cells by recombination events. For example, such a 

25 sequence can be introduced into a cell, and thereby effect homologous recombination 
at the site of an endogenous gene or a sequence with substantial identity to the gene. 
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Other recombination-based methods such as nonhomologous recombinations or 
deletion of endogenous genes by homologous recombination may also be used. 

The nucleic acids of the present invention find use as templates for the 
recombinant production of E. cloacae- derived peptides or polypeptides. 

5 

Identification and Use of E. cloacae Nucleic Acid Sequences 

The disclosed E. cloacae polypeptide and nucleic acid sequences, or other 
sequences that are contained within ORFs, including complete protein-coding 
10 sequences, of which any of the disclosed E. cloacae-specific sequences forms a part, 
are useful as target components for diagnosis and/or treatment of E. cloacae- caused 
infection 

It will be understood that the sequence of an entire protein-coding sequence of 
which each disclosed nucleic acid sequence forms a part can be isolated and identified 

15 based on each disclosed sequence. This can be achieved, for example, by using an 
isolated nucleic acid encoding the disclosed sequence, or fragments thereof, to prime a 
sequencing reaction with genomic E. cloacae DNA as template; this is followed by 
sequencing the amplified product. The isolated nucleic acid encoding the disclosed 
sequence, or fragments thereof, can also be hybridized to E. cloacae genomic libraries 

20 to identify clones containing additional complete segments of the protein-coding 

sequence of which the shorter sequence forms a part Then, the entire protein-coding 
sequence, or fragments thereof, or nucleic acids encoding all or part of the sequence, 
or sequence-conservative or function-conservative variants thereof, may be employed 
in practicing the present invention. 

25 Preferred sequences are those that are useful in diagnostic and/or therapeutic 

applications. Diagnostic applications include without limitation nucleic-acid-based and 
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antibody-based methods for detecting bacterial infection. Therapeutic applications 
include without limitation vaccines, passive immunotherapy, and drug treatments 
directed against gene products that are both unique to bacteria and essential for growth 
and/or replication of bacteria. 

5 

Identification of Nucleic Acids Encoding Vaccine Components and Targets for Agents 
Effective Against E. cloacae 

The disclosed E. cloacae genome sequence includes segments that direct the 
10 synthesis of ribonucleic acids and polypeptides, as well as origins of replication, 
promoters, other types of regulatory sequences, and intergenic nucleic acids. The 
invention encompasses nucleic acids encoding immunogenic components of vaccines 
and targets for agents effective against E. cloacae. Identification of said immunogenic 
components involved in the determination of the function of the disclosed sequences, 
15 which can be achieved using a variety of approaches. Non-limiting examples of these 
approaches are described briefly below. 

Homology to known sequences: 

Computer-assisted comparison of the disclosed E. cloacae sequences with 

20 previously reported sequences present in publicly available databases is useful for 
identifying functional E. cloacae nucleic acid and polypeptide sequences. It will be 
understood that protein-coding sequences, for example, may be compared as a whole, 
and that a high degree of sequence homology between two proteins (such as, for 
example, >80-90%) at the amino acid level indicates that the two proteins also possess 

25 some degree of functional homology, such as, for example, among enzymes involved 
in metabolism, DNA synthesis, or cell wall synthesis, and proteins involved in 
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transport, cell division, etc. In addition, many structural features of particular protein 
classes have been identified and correlate with specific consensus sequences, such as, 
for example, binding domains for nucleotides, DNA, metal ions, and other small 
molecules; sites for covalent modifications such as phosphorylation, acylation, and the 
5 like; sites of protein:protein interactions, etc. These consensus sequences may be 
quite short and thus may represent only a fraction of the entire protein-coding 
sequence. Identification of such a feature in an E. cloacae sequence is therefore 
useful in determining the function of the encoded protein and identifying useful targets 
of antibacterial drugs. 

*:3 10 Of particular relevance to the present invention are structural features that are 

!;!! common to secretory, transmembrane, and surface proteins, including secretion signal 

peptides and hydrophobic transmembrane domains. E. cloacae proteins identified as 
containing putative signal sequences and/or transmembrane domains are useful as 

sr. 

iy immunogenic components of vaccines. 

| : £ 

; ! * 15 Targets for therapeutic drugs according to the invention include, but are not 

limited to, polypeptides of the invention, whether unique to E. cloacae or not, that are 
essential for growth and/or viability of E. cloacae under at least one growth condition. 
Polypeptides essential for growth and/or viability can be determined by examining the 
effect of deleting and/or disrupting the genes, i.e., by so-called gene "knockout". 

20 Alternatively, genetic footprinting can be used (Smith et aL, 1995, Proc. Natl. Acad. 
ScL USA 92:5479-6433; Published International Application WO 94/26933; U.S. 
Patent No. 5,612,180). Still other methods for assessing essentiality includes the 
ability to isolate conditional lethal mutations in the specific gene (e.g., temperature 
sensitive mutations). Other useful targets for therapeutic drugs, which include 

25 polypeptides that are not essential for growth or viability per se but lead to loss of 
viability of the cell, can be used to target therapeutic agents to cells. 



Strain-specific sequences: 

Because of the evolutionary relationship between different E. cloacae strains, it 
is believed that the presently disclosed E. cloacae sequences are useful for identifying, 
and/or discriminating between, previously known and new E. cloacae strains. It is 
5 believed that other E. cloacae strains will exhibit at least about 70% sequence 

homology with the presently disclosed sequence. Systematic and routine analyses of 
DNA sequences derived from samples containing E. cloacae strains, and comparison 
with the present sequence allows for the identification of sequences that can be used 
to discriminate between strains, as well as those that are common to all E. cloacae 
10 strains. In one embodiment, the invention provides nucleic acids, including probes, 
and peptide and polypeptide sequences that discriminate between different strains of 
E. cloacae. Strain-specific components can also be identified functionally by their 
ability to elicit or react with antibodies that selectively recognize one or more E. 
cloacae strains. 

15 In another embodiment, the invention provides nucleic acids, including probes, 

and peptide and polypeptide sequences that are common to all E. cloacae strains but 
are not found in other bacterial species. 

E. cloacae Polypeptides 

20 

This invention encompasses isolated E. cloacae polypeptides encoded by the 
disclosed E. cloacae genomic sequences, including the polypeptides of the invention 
contained in the Sequence Listing. Polypeptides of the invention are preferably at 
least about 5 amino acid residues in length. Using the DNA sequence information 
25 provided herein, the amino acid sequences of the polypeptides encompassed by the 
invention can be deduced using methods well-known in the art. It will be understood 
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that the sequence of an entire nucleic acid encoding an E. cloacae polypeptide can be 
isolated and identified based on an ORF that encodes only a fragment of the cognate 
protein-coding region. This can be achieved, for example, by using the isolated 
nucleic acid encoding the ORF, or fragments thereof, to prime a polymerase chain 
5 reaction with genomic E. cloacae DNA as template; this is followed by sequencing 
the amplified product. 

The polypeptides of the present invention, including function-conservative 
variants of the disclosed ORFs, may be isolated from wild-type or mutant E. cloacae 
cells, or from heterologous organisms or cells (including, but not limited to, bacteria, 

10 fungi, insect, plant, and mammalian cells) including E. cloacae into which an E. 
cloacae-derived protein-coding sequence has been introduced and expressed. 
Furthermore, the polypeptides may be part of recombinant fusion proteins. 

£*. cloacae polypeptides of the invention can be chemically synthesized using 
commercially automated procedures such as those referenced herein , including, 

15 without limitation, exclusive solid phase synthesis, partial solid phase methods, 

fragment condensation or classical solution synthesis. The polypeptides are preferably 
prepared by solid phase peptide synthesis as described by Merrifield, 1963, J. Am. 
Chenu Soc. 85:2149. The synthesis is carried out with amino acids that are protected 
at the alpha-amino terminus. Trifunctional amino acids with labile side-chains are 

20 also protected with suitable groups to prevent undesired chemical reactions from 

occurring during the assembly of the polypeptides. The alpha-amino protecting group 
is selectively removed to allow subsequent reaction to take place at the amino- 
terminus. The conditions for the removal of the alpha-amino protecting group do not 
remove the side-chain protecting groups. 

25 Methods for polypeptide purification are well-known in the art, including, 

without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, 
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reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, and 
countercurrent distribution. For some purposes, it is preferable to produce the 
polypeptide in a recombinant system in which the E. cloacae protein contains an 
additional sequence tag that facilitates purification, such as, but not limited to, a 
5 polyhistidine sequence. The polypeptide can then be purified from a crude lysate of 
the host cell by chromatography on an appropriate solid-phase matrix. Alternatively, 
antibodies produced against an E. cloacae protein or against peptides derived 
therefrom can be used as purification reagents. Other purification methods are 
possible. 

10 The present invention also encompasses derivatives and homologues of E. 

cloacae-encoded polypeptides. For some purposes, nucleic acid sequences encoding 
the peptides may be altered by substitutions, additions, or deletions that provide for 
functionally equivalent molecules, i.e., function-conservative variants. For example, 
one or more amino acid residues within the sequence can be substituted by another 

15 amino acid of similar properties, such as, for example, positively charged amino acids 
(arginine, lysine, and histidine); negatively charged amino acids (aspartate and 
glutamate); polar neutral amino acids; and non-polar amino acids. 

The isolated polypeptides may be modified by, for example, phosphorylation, 
sulfation, acylation, or other protein modifications. They may also be modified with a 

20 label capable of providing a detectable signal, either directly or indirectly, including, 
but not limited to, radioisotopes and fluorescent compounds. 

To identify E. cloacae-derived polypeptides for use in the present invention, 
essentially the complete genomic sequence of a virulent, methicillin-resistant isolate of 
Enterobacter cloacae isolate was analyzed. While, in very rare instances, a nucleic 

25 acid sequencing error may be revealed, resolving a rare sequencing error is well 



52 



• 



within the art, and such an occurrence will not prevent one skilled in the art from 
practicing the invention. 

Also encompassed are any E. cloacae polypeptide sequences that are contained 
within the open reading frames (ORFs), including complete protein-coding sequences, 
5 of which any of SEQ ID NO: 1 - SEQ ID NO: 5662 forms a part. Table 2, which is 
appended herewith and which forms part of the present specification, provides a 



putative identification of the particular function of a polypeptide which is encoded by 



each ORF, based on the homology match (determined by the BLAST algorithm) of 
. the predicted polypeptide with known proteins encoded by ORFs in other organisms. 
10 As a result, one skilled in the art can use the polypeptides of the present invention for 
commercial and industrial purposes consistent with the type of putative identification 
of the polypeptide. 

The present invention provides a library of E. cloacae-deri\ed polypeptide 
sequences, and a corresponding library of nucleic acid sequences encoding the 
15 polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise sequences that are contemplated for use as 
components of vaccines. Non-limiting examples of such sequences are listed by SEQ 
ID NO in Table 2, which is appended herewith and which forms part of the present 
specification. 

20 The present invention also provides a library of E. cloacae-derived polypeptide 

sequences, and a corresponding library of nucleic acid sequences encoding the 
polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise sequences lacking homology to any known 
prokaryotic or eukaryotic sequences. Such libraries provide probes, primers, and 

25 markers which can be used to diagnose E. cloacae infection, including use as markers 
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in epidemiological studies. Non-limiting examples of such sequences are listed by 
SEQ ID NO in Table 2, which is appended 

The present invention also provides a library of E. cloacae-derived polypeptide 
sequences, and a corresponding library of nucleic acid sequences encoding the 
5 polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise targets for therapeutic drugs. 

Specific Example: Determination Of Enterobacter Protein Antigens For Antibody 
And Vaccine Development 

10 

The selection of Enterobacter protein antigens for vaccine development can be 
derived from the nucleic acids encoding E. cloacae polypeptides. First, the ORFs can 
be analyzed for homology to other known exported or membrane proteins and 
analyzed using the discriminant analysis described by Klein, et al. (Klein, P., 

15 Kanehsia, M., and DeLisi, C. (1985) Biochimica et Biophysica Acta 815, 468-476) for 
predicting exported and membrane proteins. 

Homology searches can be performed using the BLAST algorithm contained in 
the Wisconsin Sequence Analysis Package (Genetics Computer Group, University 
Research Park, 575 Science Drive, Madison, WI 53711) to compare each predicted 

20 ORF amino acid sequence with all sequences found in the current GenBank, SWISS- 
PROT and PIR databases. BLAST searches for local alignments between the ORF 
and the databank sequences and reports a probability score which indicates the 
probability of finding this sequence by chance in the database. ORF's with significant 
homology (e.g. probabilities lower than 1x10 6 that the homology is only due to 

25 random chance) to membrane or exported proteins represent protein antigens for 
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vaccine development. Possible functions can be provided to E. cloacae genes based 
on sequence homology to genes cloned in other organisms. 

Discriminant analysis (Klein, et al. supra) can be used to examine the ORF 
amino acid sequences. This algorithm uses the intrinsic information contained in the 
5 ORF amino acid sequence and compares it to information derived from the properties 
of known membrane and exported proteins. This comparison predicts which proteins 
will be exported, membrane associated or cytoplasmic. ORF amino acid sequences 
identified as exported or membrane associated by this algorithm are likely protein 
antigens for vaccine development. 

10 

Production of Fragments and Analogs of E. cloacae Nucleic Acids and Polypeptides 

Based on the discovery of the E. cloacae gene products of the invention 
provided in the Sequence Listing, one skilled in the art can alter the disclosed 

15 structure of E. cloacae genes, e.g., by producing fragments or analogs, and test the 
newly produced structures for activity. Examples of techniques known to those 
skilled in the relevant art which allow the production and testing of fragments and 
analogs are discussed below. These, or analogous methods can be used to make and 
screen libraries of polypeptides, e.g., libraries of random peptides or libraries of 

20 fragments or analogs of cellular proteins for the ability to bind E. cloacae 

polypeptides. Such screens are useful for the identification of inhibitors of E. cloacae. 



Generation of Fragments 

Fragments of a protein can be produced in several ways, e.g., recombinantly, 
25 by proteolytic digestion, or by chemical synthesis. Internal or terminal fragments of a 
polypeptide can be generated by removing one or more nucleotides from one end (for 
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a terminal fragment) or both ends (for an internal fragment) of a nucleic acid which 
encodes the polypeptide. Expression of the mutagenized DNA produces polypeptide 
fragments. Digestion with "end-nibbling" endonucleases can thus generate DNAs 
which encode an array of fragments. DNAs which encode fragments of a protein can 
5 also be generated by random shearing, restriction digestion or a combination of the 
above-discussed methods. 

Fragments can also be chemically synthesized using techniques known in the 
art such as conventional Merrifield solid phase f-Moc or t-Boc chemistry. For 
example, peptides of the present invention may be arbitrarily divided into fragments of 
10 desired length with no overlap of the fragments, or divided into overlapping fragments 
of a desired length. 

Alteration of Nucleic Acids and Polypeptides: Random Methods 

Amino acid sequence variants of a protein can be prepared by random 
15 mutagenesis of DNA which encodes a protein or a particular domain or region of a 
protein. Useful methods include PCR mutagenesis and saturation mutagenesis. A 
library of random amino acid sequence variants can also be generated by the synthesis 
of a set of degenerate oligonucleotide sequences. (Methods for screening proteins in a 
library of variants are elsewhere herein). 

20 

PCR Mutagenesis 

In PCR mutagenesis, reduced Taq polymerase fidelity is used to introduce 

random mutations into a cloned fragment of DNA (Leung et al, 1989, Technique 

1:11-15). The DNA region to be mutagenized is amplified using the polymerase 

25 chain reaction (PCR) under conditions that reduce the fidelity of DNA synthesis by 

2+ 

Taq DNA polymerase, e.g., by using a dGTP/dATP ratio of five and adding Mn to 
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the PCR reaction. The pool of amplified DNA fragments are inserted into appropriate 
cloning vectors to provide random mutant libraries. 

Saturation Mutagenesis 

5 Saturation mutagenesis allows for the rapid introduction of a large number of 

single base substitutions into cloned DNA fragments (Mayers et al., 1985, Science 
229:242). This technique includes generation of mutations, e.g., by chemical 
treatment or irradiation of single-stranded DNA in vitro, and synthesis of a 
complimentary DNA strand. The mutation frequency can be modulated by 

10 modulating the severity of the treatment, and essentially all possible base substitutions 
can be obtained. Because this procedure does not involve a genetic selection for 
mutant fragments both neutral substitutions, as well as those that alter function, are 
obtained. The distribution of point mutations is not biased toward conserved sequence 
elements. 

15 

Degenerate Oligonucleotides 

A library of homologs can also be generated from a set of degenerate 
oligonucleotide sequences. Chemical synthesis of a degenerate sequences can be 
carried out in an automatic DNA synthesizer, and the synthetic genes then ligated into 

20 an appropriate expression vector. The synthesis of degenerate oligonucleotides is 

known in the art (see for example, Narang, SA (1983) Tetrahedron 39:3; Itakura et al. 
(1981) Recombinant DNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. AG 
Walton, Amsterdam: Elsevier pp273-289; Itakura et al. (1984) Annu. Rev. Biochem. 
53:323; Itakura et al. (1984) Science 198:1056; Dee et al. (1983) Nucleic Acid Res. 

25 11:477. Such techniques have been employed in the directed evolution of other 
proteins (see, for example, Scott et al. (1990) Science 249:386-390; Roberts et al. 
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(1992) PNAS 89:2429-2433; Devlin et al. (1990) Science 249: 404-406; Cwirla et al. 
(1990) PNAS 87: 6378-6382; as well as U.S. Patents Nos. 5,223,409, 5,198,346, and 
5,096,815). 

5 Alteration of Nucleic Acids and Polypeptides: Methods for Directed Mutagenesis 
Non-random or directed, mutagenesis techniques can be used to provide 
specific sequences or mutations in specific regions. These techniques can be used to 
create variants which include, e.g., deletions, insertions, or substitutions, of residues of 
the known amino acid sequence of a protein. The sites for mutation can be modified 
10 individually or in series, e.g., by (1) substituting first with conserved amino acids and 
then with more radical choices depending upon results achieved, (2) deleting the target 
residue, or (3) inserting residues of the same or a different class adjacent to the 
located site, or combinations of options 1-3. 

15 Alanine Scanning Mutagenesis 

Alanine scanning mutagenesis is a useful method for identification of certain 
residues or regions of the desired protein that are preferred locations or domains for 
mutagenesis, Cunningham and Wells (Science 244:1081-1085, 1989). In alanine 
scanning, a residue or group of target residues are identified (e.g., charged residues 

20 such as Arg, Asp, His, Lys, and Glu) and replaced by a neutral or negatively charged 
amino acid (most preferably alanine or poly alanine). Replacement of an amino acid 
can affect the interaction of the amino acids with the surrounding aqueous 
environment in or outside the cell. Those domains demonstrating functional 
sensitivity to the substitutions are then refined by introducing further or other variants 

25 at or for the sites of substitution. Thus, while the site for introducing an amino acid 
sequence variation is predetermined, the nature of the mutation per se need not be 



predetermined. For example, to optimize the performance of a mutation at a given 
site, alanine scanning or random mutagenesis may be conducted at the target codon or 
region and the expressed desired protein subunit variants are screened for the optimal 
combination of desired activity. 

5 

Oligonucleotide-Mediated Mutagenesis 

Oligonucleotide-mediated mutagenesis is a useful method for preparing 
substitution, deletion, and insertion variants of DNA, see, e.g., Adelman et al., (DNA 
2:183, 1983). Briefly, the desired DNA is altered by hybridizing an oligonucleotide 
;;r| 10 encoding a mutation to a DNA template, where the template is the single- stranded 

i^y form of a plasmid or bacteriophage containing the unaltered or native DNA sequence 

= 3 of the desired protein. After hybridization, a DNA polymerase is used to synthesize 

; an entire second complementary strand of the template that will thus incorporate the 
! * oligonucleotide primer, and will code for the selected alteration in the desired protein 

15 DNA. Generally, oligonucleotides of at least about 25 nucleotides in length are used. 
An optimal oligonucleotide will have 12 to 15 nucleotides that are completely 
complementary to the template on either side of the nucleotide(s) coding for the 
mutation. This ensures that the oligonucleotide will hybridize properly to the single- 
stranded DNA template molecule. The oligonucleotides are readily synthesized using 
20 techniques known in the art such as that described by Crea et al. (Proc. Natl. Acad. 
ScL USA, 75: 5765[1978]). 



Cassette Mutagenesis 

Another method for preparing variants, cassette mutagenesis, is based on the 
25 technique described by Wells et al. (Gene, 34:315[1985]). The starting material is a 
plasmid (or other vector) which includes the protein subunit DNA to be mutated. The 
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codon(s) in the protein subunit DNA to be mutated are identified. There must be a 
unique restriction endonuclease site on each side of the identified mutation site(s). If 
no such restriction sites exist, they may be generated using the above-described 
oligonucleotide-mediated mutagenesis method to introduce them at appropriate 
5 locations in the desired protein subunit DNA. After the restriction sites have been 
introduced into the plasmid, the plasmid is cut at these sites to linearize it. A double- 
stranded oligonucleotide encoding the sequence of the DNA between the restriction 
sites but containing the desired mutation(s) is synthesized using standard procedures. 
The two strands are synthesized separately and then hybridized together using standard 
10 techniques. This double-stranded oligonucleotide is referred to as the cassette. This 
cassette is designed to have 3' and 5' ends that are comparable with the ends of the 
linearized plasmid, such that it can be directly ligated to the plasmid. This plasmid 
now contains the mutated desired protein subunit DNA sequence. 

15 Combinatorial Mutagenesis 

Combinatorial mutagenesis can also be used to generate mutants (Ladner et al., 
WO 88/06630). In this method, the amino acid sequences for a group of homologs or 
other related proteins are aligned, preferably to promote the highest homology 
possible. All of the amino acids which appear at a given position of the aligned 

20 sequences can be selected to create a degenerate set of combinatorial sequences. The 
variegated library of variants is generated by combinatorial mutagenesis at the nucleic 
acid level, and is encoded by a variegated gene library. For example, a mixture of 
synthetic oligonucleotides can be enzymatically ligated into gene sequences such that 
the degenerate set of potential sequences are expressible as individual peptides, or 

25 alternatively, as a set of larger fusion proteins containing the set of degenerate 
sequences. 
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Other Modifications of E. cloacae Nucleic Acids and Polypeptides 

It is possible to modify the structure of an E. cloacae polypeptide for such 
purposes as increasing solubility, enhancing stability (e.g., shelf life ex vivo and 
5 resistance to proteolytic degradation in vivo). A modified E. cloacae protein or 
peptide can be produced in which the amino acid sequence has been altered, such as 
by amino acid substitution, deletion, or addition as described herein. 

An E. cloacae peptide can also be modified by substitution of cysteine residues 
preferably with alanine, serine, threonine, leucine or glutamic acid residues to 

10 minimize dimerization via disulfide linkages. In addition, amino acid side chains of 
fragments of the protein of the invention can be chemically modified. Another 
modification is cyclization of the peptide. 

In order to enhance stability and/or reactivity, an E. cloacae polypeptide can be 
modified to incorporate one or more polymorphisms in the amino acid sequence of the 

15 protein resulting from any natural allelic variation. Additionally, D-amino acids, non- 
natural amino acids, or non-amino acid analogs can be substituted or added to produce 
a modified protein within the scope of this invention. Furthermore, an E. cloacae 
polypeptide can be modified using polyethylene glycol (PEG) according to the method 
of A. Sehon and co-workers (Wie et al., supra) to produce a protein conjugated with 

20 PEG. In addition, PEG can be added during chemical synthesis of the protein. Other 
modifications of E. cloacae proteins include reduction/alkylation (Tarr, Methods of 
Protein Microcharacterization, J. E. Silver ed., Humana Press, Clifton NJ 155-194 
(1986)); acylation (Tarr, supra); chemical coupling to an appropriate carrier (Mishell 
and Shiigi, eds, Selected Methods in Cellular Immunology, WH Freeman, San 

25 Francisco, CA (1980), U.S. Patent 4,939,239; or mild formalin treatment (Marsh, 
(1971) Int. Arch, of Allergy and Appl. Immunol, 41: 199 - 215). 
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To facilitate purification and potentially increase solubility of an E. cloacae 
protein or peptide, it is possible to add an amino acid fusion moiety to the peptide 
backbone. For example, hexa-histidine can be added to the protein for purification by 
immobilized metal ion affinity chromatography (Hochuli, E. et al., (1988) 
5 Bio/Technology, 6: 1321 - 1325). In addition, to facilitate isolation of peptides free of 
irrelevant sequences, specific endoprotease cleavage sites can be introduced between 
the sequences of the fusion moiety and the peptide. 

To potentially aid proper antigen processing of epitopes within an E. cloacae 
polypeptide, canonical protease sensitive sites can be engineered between regions, 

10 each comprising at least one epitope via recombinant or synthetic methods. For 
example, charged amino acid pairs, such as KK or RR, can be introduced between 
regions within a protein or fragment during recombinant construction thereof. The 
resulting peptide can be rendered sensitive to cleavage by cathepsin and/or other 
trypsin-like enzymes which would generate portions of the protein containing one or 

15 more epitopes. In addition, such charged amino acid residues can result in an increase 
in the solubility of the peptide. 

Primary Methods for Screening Polypeptides and Analogs 

Various techniques are known in the art for screening generated mutant gene 

20 products. Techniques for screening large gene libraries often include cloning the gene 
library into replicable expression vectors, transforming appropriate cells with the 
resulting library of vectors, and expressing the genes under conditions in which 
detection of a desired activity, e.g., in this case, binding to E. cloacae polypeptide or 
an interacting protein, facilitates relatively easy isolation of the vector encoding the 

25 gene whose product was detected. Each of the techniques described below is 
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amenable to high through-put analysis for screening large numbers of sequences 
created, e.g., by random mutagenesis techniques. 

Two Hybrid Systems 

5 Two hybrid assays such as the system described below (as with the other 

screening methods described herein), can be used to identify polypeptides, e.g., 
fragments or analogs of a naturally-occurring E. cloacae polypeptide, e.g., of cellular 
proteins, or of randomly generated polypeptides which bind to an E. cloacae protein. 
(The E. cloacae domain is used as the bait protein and the library of variants are 

10 expressed as prey fusion proteins.) In an analogous fashion, a two hybrid assay (as 
with the other screening methods described herein), can be used to find polypeptides 
which bind an E. cloacae polypeptide. 

Display Libraries 

15 In one approach to screening assays, the Enterobacter peptides are displayed on 

the surface of a cell or viral particle, and the ability of particular cells or viral 
particles to bind an appropriate receptor protein via the displayed product is detected 
in a "panning assay". For example, the gene library can be cloned into the gene for a 
surface membrane protein of a bacterial cell, and the resulting fusion protein detected 

20 by panning (Ladner et al., WO 88/06630; Fuchs et al. (1991) Bio/Technology 9:1370- 
1371; and Goward et al. (1992) TIBS 18:136-140). In a similar fashion, a detectably 
labeled ligand can be used to score for potentially functional peptide homologs. 
Fluorescently labeled ligands, e.g., receptors, can be used to detect homologs which 
retain ligand-binding activity. The use of fluorescently labeled ligands, allows cells to 

25 be visually inspected and separated under a fluorescence microscope, or, where the 
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morphology of the cell permits, to be separated by a fluorescence-activated cell sorter. 

A gene library can be expressed as a fusion protein on the surface of a viral 

particle. For instance, in the filamentous phage system, foreign peptide sequences can 

5 be expressed on the surface of infectious phage, thereby conferring two significant 

benefits. First, since these phage can be applied to affinity matrices at concentrations 
13 

well over 10 phage per milliliter, a large number of phage can be screened at one 
time. Second, since each infectious phage displays a gene product on its surface, if a 
particular phage is recovered from an affinity matrix in low yield, the phage can be 

10 amplified by another round of infection. The group of almost identical E. coli 

filamentous phages, M13, fd., and fl, are most often used in phage display libraries. 
Either of the phage gin or gVin coat proteins can be used to generate fusion proteins 
without disrupting the ultimate packaging of the viral particle. Foreign epitopes can 
be expressed at the NH 2 - terminal end of pill and phage bearing such epitopes 

15 recovered from a large excess of phage lacking this epitope (Ladner et al. PCT 

publication WO 90/02909; Garrard et al., PCT publication WO 92/09690; Marks et al. 
(1992) J. Biol Chem. 267:16007-16010; Griffiths et al. (1993) EMBO J 12:725-734; 
Clackson et al. (1991) Nature 352:624-628; and Barbas et al. (1992) PNAS 89:4457- 
4461). 

20 A common approach uses the maltose receptor of E. coli (the outer membrane 

protein, LamB) as a peptide fusion partner (Charbit et al. (1986) EMBO 5, 3029- 
3037). Oligonucleotides have been inserted into plasmids encoding the LamB gene to 
produce peptides fused into one of the extracellular loops of the protein. These 
peptides are available for binding to ligands, e.g., to antibodies, and can elicit an 

25 immune response when the cells are administered to animals. Other cell surface 
proteins, e.g., OmpA (Schorr et al. (1991) Vaccines 91 y pp. 387-392), PhoE 
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(Agterberg, et al. (1990) Gene 88, 37-45), and PAL (Fuchs et al. (1991) Bio/Tech 9 y 
1369-1372), as well as large bacterial surface structures have served as vehicles for 
peptide display. Peptides can be fused to pilin, a protein which polymerizes to form 
the pilus-a conduit for interbacterial exchange of genetic information (Thiry et al. 
5 (1989) Appl Environ. Microbiol 55, 984-993). Because of its role in interacting with 
other cells, the pilus provides a useful support for the presentation of peptides to the 
extracellular environment. Another large surface structure used for peptide display is 
the bacterial motive organ, the flagellum. Fusion of peptides to the subunit protein 
flagellin offers a dense array of many peptide copies on the host cells (Kuwajima et 

10 al. (1988) Bio/Tech. 6, 1080-1083). Surface proteins of other bacterial species have 
also served as peptide fusion partners. Examples include the Staphylococcus protein 
A and the outer membrane IgA protease of Neisseria (Hansson et al. (1992) J. 
Bacteriol 174, 4239-4245 and Klauser et al. (1990) EMBO J. 9, 1991-1999). 

In the filamentous phage systems and the LamB system described above, the 

15 physical link between the peptide and its encoding DNA occurs by the containment of 
the DNA within a particle (cell or phage) that carries the peptide on its surface. 
Capturing the peptide captures the particle and the DNA within. An alternative 
scheme uses the DNA-binding protein LacI to form a link between peptide and DNA 
(Cull et al (1992) PNAS USA 89:1865-1869). This system uses a plasmid containing 

20 the LacI gene with an oligonucleotide cloning site at its 3 '-end. Under the controlled 
induction by arabinose, a Lacl-peptide fusion protein is produced. This fusion retains 
the natural ability of LacI to bind to a short DNA sequence known as LacO operator 
(LacO). By installing two copies of LacO on the expression plasmid, the Lacl-peptide 
fusion binds tightly to the plasmid that encoded it. Because the plasmids in each cell 

25 contain only a single oligonucleotide sequence and each cell expresses only a single 
peptide sequence, the peptides become specifically and stablely associated with the 



DNA sequence that directed its synthesis. The cells of the library are gently lysed 
and the peptide-DNA complexes are exposed to a matrix of immobilized receptor to 
recover the complexes containing active peptides. The associated plasmid DNA is 
then reintroduced into cells for amplification and DNA sequencing to determine the 
5 identity of the peptide ligands. As a demonstration of the practical utility of the 
method, a large random library of dodecapeptides was made and selected on a 
monoclonal antibody raised against the opioid peptide dynorphin B. A cohort of 
peptides was recovered, all related by a consensus sequence corresponding to a six- 
residue portion of dynorphin B. (Cull et al. (1992) Proc. Natl. Acad. ScL U.S.A. 89- 
10 1869) 

This scheme, sometimes referred to as peptides-on-plasmids, differs in two 
important ways from the phage display methods. First, the peptides are attached to 
the C-terminus of the fusion protein, resulting in the display of the library members as 
peptides having free carboxy termini. Both of the filamentous phage coat proteins, 

15 pill and pVIII, are anchored to the phage through their C-termini, and the guest 

peptides are placed into the outward-extending N-terminal domains. In some designs, 
the phage-displayed peptides are presented right at the amino terminus of the fusion 
protein. (Cwirla, et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 6378-6382) A second 
difference is the set of biological biases affecting the population of peptides actually 

20 present in the libraries. The LacI fusion molecules are confined to the cytoplasm of 
the host cells. The phage coat fusions are exposed briefly to the cytoplasm during 
translation but are rapidly secreted through the inner membrane into the periplasmic 
compartment, remaining anchored in the membrane by their C-terminal hydrophobic 
domains, with the N-termini, containing the peptides, protruding into the periplasm 

25 while awaiting assembly into phage particles. The peptides in the LacI and phage 
libraries may differ significantly as a result of their exposure to different proteolytic 
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activities. The phage coat proteins require transport across the inner membrane and 

signal peptidase processing as a prelude to incorporation into phage. Certain peptides 

exert a deleterious effect on these processes and are underrepresented in the libraries 

(Gallop et al. (1994) J. Med. Chem. 37(9):1233-1251). These particular biases are not 

5 a factor in the LacI display system. 

The number of small peptides available in recombinant random libraries is 
7 9 

enormous. Libraries of 10 -10 independent clones are routinely prepared. Libraries 
as large as 10 11 recombinants have been created, but this size approaches the practical 
limit for clone libraries. This limitation in library size occurs at the step of 

10 transforming the DNA containing randomized segments into the host bacterial cells. 
To circumvent this limitation, an in vitro system based on the display of nascent 
peptides in polysome complexes has recently been developed. This display library 
method has the potential of producing libraries 3-6 orders of magnitude larger than the 
currently available phage/phagemid or plasmid libraries. Furthermore, the construction 

15 of the libraries, expression of the peptides, and screening, is done in an entirely cell- 
free format. 

In one application of this method (Gallop et al. (1994) J. Med. Chem. 

12 

37(9): 1233-1251), a molecular DNA library encoding 10 decapeptides was 
constructed and the library expressed in an £ coli S30 in vitro coupled 

20 transcription/translation system. Conditions were chosen to stall the ribosomes on the 
mRNA, causing the accumulation of a substantial proportion of the RNA in 
polysomes and yielding complexes containing nascent peptides still linked to their 
encoding RNA. The polysomes are sufficiently robust to be affinity purified on 
immobilized receptors in much the same way as the more conventional recombinant 

25 peptide display libraries are screened. RNA from the bound complexes is recovered, 
converted to cDNA, and amplified by PCR to produce a template for the next round 
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of synthesis and screening. The polysome display method can be coupled to the 
phage display system. Following several rounds of screening, cDNA from the 
enriched pool of polysomes was cloned into a phagemid vector. This vector serves as 
both a peptide expression vector, displaying peptides fused to the coat proteins, and as 

5 a DNA sequencing vector for peptide identification. By expressing the polysome- 
derived peptides on phage, one can either continue the affinity selection procedure in 
this format or assay the peptides on individual clones for binding activity in a phage 
ELISA, or for binding specificity in a completion phage ELISA (Barret, et al. (1992) 
Anal. Biochem 204,357-364). To identify the sequences of the active peptides one 

10 sequences the DNA produced by the phagemid host. 

Secondary Screening of Polypeptides and Analogs 

The high through-put assays described above can be followed by secondary 
screens in order to identify further biological activities which will, e.g., allow one_ J 

15 skilled in the art to differentiate agonists from antagonists. The type of a secondary 
screen used will depend on the desired activity that needs to be tested. For example, 
an assay can be developed in which the ability to inhibit an interaction between a 
protein of interest and its respective ligand can be used to identify antagonists from a 
group of peptide fragments isolated though one of the primary screens described 

20 above. 

Therefore, methods for generating fragments and analogs and testing them for 
activity are known in the art. Once the core sequence of interest is identified, it is 
routine for one skilled in the art to obtain analogs and fragments. 

25 
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Peptide Mimetics of E. cloacae Polypeptides 

The invention also provides for reduction of the protein binding domains of the 
subject E. cloacae polypeptides to generate mimetics, e.g. peptide or non-peptide 
agents. The peptide mimetics are able to disrupt binding of a polypeptide to its 
5 counter ligand, e.g., in the case of an E. cloacae polypeptide binding to a naturally 
occurring ligand. The critical residues of a subject E. cloacae polypeptide which are 
involved in molecular recognition of a polypeptide can be determined and used to 
generate E. cloacae-derived peptidomimetics which competitively or noncompetitively 
inhibit binding of the E. cloacae polypeptide with an interacting polypeptide (see, for 

10 example, European patent applications EP-412/762A and EP-B31,080A). 

For example, scanning mutagenesis can be used to map the amino acid 
residues of a particular E. cloacae polypeptide involved in binding an interacting 
polypeptide, peptidomimetic compounds (e.g. diazepine or isoquinoline derivatives) 
can be generated which mimic those residues in binding to an interacting polypeptide, 

15 and which therefore can inhibit binding of an E. cloacae polypeptide to an interacting 
polypeptide and thereby interfere with the function of E. cloacae polypeptide. For 
instance, non-hydrolyzable peptide analogs of such residues can be generated using 
benzodiazepine (e.g., see Freidinger et al. in Peptides: Chemistry and Biology, G.R. 
Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), azepine (e.g., see 

20 Huffman et al. in Peptides: Chemistry and Biology, G.R. Marshall ed., ESCOM 

Publisher: Leiden, Netherlands, 1988), substituted gama lactam rings (Garvey et al. in 
Peptides: Chemistry and Biology, G.R. Marshall ed., ESCOM Publisher: Leiden, 
Netherlands, 1988), keto-methylene pseudopeptides (Ewenson et al. (1986) J Med 
Chem 29:295; and Ewenson et al. in Peptides: Structure and Function (Proceedings of 

25 the 9th American Peptide Symposium) Pierce Chemical Co. Rockland, IL, 1985), b- 
turn dipeptide cores (Nagai et al. (1985) Tetrahedron Lett 26:647; and Sato et al. 



(1986) J Chem Soc Perkin Trans 1:1231), and b-aminoalcohols (Gordon et al. (1985) 
Biochem Biophys Res Commun 126:419; and et al. (1986) Biochem Biophys Res 
Commun 134:71). 



5 Vaccine Formulations for E. cloacae Nucleic Acids and Polypeptides 

This invention also features vaccine compositions for protection against 
infection by E. cloacae or for treatment of E. cloacae infection. In one embodiment, 
the vaccine compositions contain one or more immunogenic components such as a 
surface protein from E. cloacae, or portion thereof, and a pharmaceutical^ acceptable 

10 carrier. Nucleic acids within the scope of the invention are exemplified by the nucleic 
acids of the invention contained in the Sequence Listing which encode E. cloacae 
surface proteins. Any nucleic acid encoding an immunogenic E. cloacae protein, or 
portion thereof, which is capable of expression in a cell, can be used in the present 
invention. These vaccines have therapeutic and prophylactic utilities. 

15 One aspect of the invention provides a vaccine composition for protection 

against infection by E. cloacae which contains at least one immunogenic fragment of 
an E. cloacae protein and a pharmaceutical^ acceptable carrier. Preferred fragments 
include peptides of at least about 10 amino acid residues in length, preferably about 
10-20 amino acid residues in length, and more preferably about 12-16 amino acid 

20 residues in length. 

Immunogenic components of the invention can be obtained, for example, by 
screening polypeptides recombinantly produced from the corresponding fragment of 
the nucleic acid encoding the full-length E. cloacae protein. In addition, fragments 
can be chemically synthesized using techniques known in the art such as conventional 

25 Merrifield solid phase f-Moc or t-Boc chemistry. 
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In one embodiment, immunogenic components are identified by the ability of 
the peptide to stimulate T cells. Peptides which stimulate T cells, as determined by, 
for example, T cell proliferation or cytokine secretion are defined herein as 
comprising at least one T cell epitope. T cell epitopes are believed to be involved in 
initiation and perpetuation of the immune response to the protein allergen which is 
responsible for the clinical symptoms of allergy. These T cell epitopes are thought to 
trigger early events at the level of the T helper cell by binding to an appropriate HLA 
molecule on the surface of an antigen presenting cell, thereby stimulating the T cell 
subpopulation with the relevant T cell receptor for the epitope. These events lead to 
T cell proliferation, lymphokine secretion, local inflammatory reactions, recruitment of 
additional immune cells to the site of antigen/T cell interaction, and activation of the 
B cell cascade, leading to the production of antibodies. A T cell epitope is the basic 
element, or smallest unit of recognition by a T cell receptor, where the epitope 
comprises amino acids essential to receptor recognition (e.g., approximately 6 or 7 
amino acid residues). Amino acid sequences which mimic those of the T cell epitopes 
are within the scope of this invention. 



/Screening immunogenic components can be accomplished using one or more 



of several different assays. For example, in vitro, peptide T cell stimulatory activity is 
assayed by contacting a peptide known or suspected of being immunogenic with an 
antigen presenting cell which presents appropriate MHC molecules in a T cell culture. 
Presentation of an immunogenic E. cloacae peptide in association with appropriate 
MHC molecules to T cells in conjunction with the necessary co-stimulation has the 
effect of transmitting a signal to the T cell that induces the production of increased 
levels of cytokines, particularly of interleukin-2 and interleukin-4. The culture 
supernatant can be obtained and assayed for interleukin-2 or other known cytokines. 
For example, any one of several conventional assays for interleukin-2 can be 
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employed, such as the assay described in Proc. Natl. Acad. Sci USA, 86: 1333 (1989) 
the pertinent portions of which are incorporated herein by reference. A kit for an 
assay for the production of interferon is also available from Genzyme Corporation 
(Cambridge, MA). 

5 Alternatively, a common assay for T cell proliferation entails measuring 

tritiated thymidine incorporation. The proliferation of T cells can be measured in 

3 

vitro by determining the amount of H-labeled thymidine incorporated into the 
replicating DNA of cultured cells. Therefore, the rate of DNA synthesis and, in turn, 
the rate of cell division can be quantified. 

10 compositions of the invention containing immunogenic components 

(e.g., E. cloacae polypeptide or fragment thereof or nucleic acid encoding an £. 
cloacae polypeptide or fragment thereo^TpreferaWy include a pharmaceutically 
acceptable carrier. The term "pharmaceutically acceptable carrier" refers to a carrier 
that does not cause an allergic reaction or other untoward effect in patients to whom it 

15 is administered. Suitable pharmaceutically acceptable carriers include, for example, 
one or more of water, saline, phosphate buffered saline, dextrose, glycerol, ethanol 
and the like, as well as combinations thereof. Pharmaceutically acceptable carriers 
may further comprise minor amounts of auxiliary substances such as wetting or 
emulsifying agents, preservatives or buffers, which enhance the shelf life or 

20 effectiveness of the antibody. For vaccines of the invention containing E. cloacae 
polypeptides, the polypeptide is co-administered with a suitable adjuvant. 

It will be apparent to those of skill in the art that the therapeutically effective 
amount of DNA or protein of this invention will depend, inter alia, upon the 
administration schedule, the unit dose of antibody administered, whether the protein or 

25 DNA is administered in combination with other therapeutic agents, the immune status 
and health of the patient, and the therapeutic activity of the particular protein or DNA. 
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Vaccine compositions are conventionally administered parenterally, e.g., by 
injection, either subcutaneously or intramuscularly. Methods for intramuscular 
immunization are described by Wolff et al. (1990) Science 247: 1465-1468 and by 
Sedegah et al. (1994) Immunology 91: 9866-9870. Other modes of administration 
5 include oral and pulmonary formulations, suppositories, and transdermal applications. 
Oral immunization is preferred over parenteral methods for inducing protection against 
infection by E, cloacae, Cain et. al. (1993) Vaccine 11: 637-642. Oral formulations 
include such normally employed excipients as, for example, pharmaceutical grades of 
mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, 

10 magnesium carbonate, and the like. 

The vaccine compositions of the invention can include an adjuvant, including, 
but not limited to aluminum hydroxide; N-acetyl-muramyl-L-threonyl-D-isoglutamine 
(thr-MDP); N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as 
nor-MDP) ; N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-( 1 '-2-dipalmitoyl- 

15 sn-glycero-3-hydroxyphos-phoryloxy)-ethylamine (CGP 19835 A, referred to a MTP- 
PE); RIBI, which contains three components from bacteria; monophosphoryl lipid A; 
trehalose dimycoloate; cell wall skeleton (MPL + TDM + CWS) in a 2% 
squalene/Tween 80 emulsion; and cholera toxin. Others which may be used are non- 
toxic derivatives of cholera toxin, including its B subunit, and/or conjugates or 

20 genetically engineered fusions of the E. cloacae polypeptide with cholera toxin or its 
B subunit, procholeragenoid, fungal polysaccharides, including schizophyllan, 
muramyl dipeptide, muramyl dipeptide derivatives, phorbol esters, labile toxin of E. 
coli y non-£. cloacae bacterial lysates, block polymers or saponins. 

Other suitable delivery methods include biodegradable microcapsules or 

25 immuno-stimulating complexes (ISCOMs), cochleates, or liposomes, genetically 
engineered attenuated live vectors such as viruses or bacteria, and recombinant 
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(chimeric) virus-like particles, e.g., bluetongue. The amount of adjuvant employed 
will depend on the type of adjuvant used. For example, when the mucosal adjuvant is 
cholera toxin, it is suitably used in an amount of 5 mg to 50 mg, for example 10 mg 
to 35 mg. When used in the form of microcapsules, the amount used will depend on 
5 the amount employed in the matrix of the microcapsule to achieve the desired dosage. 
The determination of this amount is within the skill of a person of ordinary skill in 
the art. 

Carrier systems in humans may include enteric release capsules protecting the 
antigen from the acidic environment of the stomach, and including E. cloacae 

10 polypeptide in an insoluble form as fusion proteins. Suitable carriers for the vaccines 
of the invention are enteric coated capsules and polylactide-glycolide microspheres. 
Suitable diluents are 0.2 N NaHC0 3 and/or saline. 

Vaccines of the invention can be administered as a primary prophylactic agent 
in adults or in children, as a secondary prevention, after successful eradication of E. 

15 cloacae in an infected host, or as a therapeutic agent in the aim to induce an immune 
response in a susceptible host to prevent infection by E. cloacae. The vaccines of the 
invention are administered in amounts readily determined by persons of ordinary skill 
in the art. Thus, for adults a suitable dosage will be in the range of 10 mg to 10 g, 
preferably 10 mg to 100 mg. A suitable dosage for adults will also be in the range 

20 of 5 mg to 500 mg. Similar dosage ranges will be applicable for children. Those 
skilled in the art will recognize that the optimal dose may be more or less depending 
upon the patient's body weight, disease, the route of administration, and other factors. 
Those skilled in the art will also recognize that appropriate dosage levels can be 
obtained based on results with known oral vaccines such as, for example, a vaccine 

25 based on an E. coli lysate (6 mg dose daily up to total of 540 mg) and with an 

enterotoxigenic E. coli purified antigen (4 doses of 1 mg) (Schulman et al., /. Urol 



150:917-921 (1993); Boedecker et al„ American Gastroenterological Assoc. 999:A- 
222 (1993)). The number of doses will depend upon the disease, the formulation, and 
efficacy data from clinical trials. Without intending any limitation as to the course of 
treatment, the treatment can be administered over 3 to 8 doses for a primary 
5 immunization schedule over 1 month (Boedeker, American Gastroenterological Assoc. 
888:A-222 (1993)). 

In a preferred embodiment, a vaccine composition of the invention can be 
based on a killed whole E. coli preparation with an immunogenic fragment of an E. 
cloacae protein of the invention expressed on its surface or it can be based on an E. 
Ifi 10 coli lysate, wherein the killed E. coli acts as a carrier or an adjuvant. 

{ y It will be apparent to those skilled in the art that some of the vaccine 

*.3 compositions of the invention are useful only for preventing E. cloacae infection, 

^ some are useful only for treating E. cloacae infection, and some are useful for both 

\ * preventing and treating E. cloacae infection. In a preferred embodiment, the vaccine 

i 5 15 composition of the invention provides protection against E. cloacae infection by 

*: ST 

stimulating humoral and/or cell-mediated immunity against E. cloacae. It should be 
understood that amelioration of any of the symptoms of E. cloacae infection is a 
desirable clinical goal, including a lessening of the dosage of medication used to treat 
E. cloacae-c&used disease, or an increase in the production of antibodies in the serum 
20 or mucous of patients. 



Antibodies Reactive With E. cloacae Polypeptides 

The invention also includes antibodies specifically reactive with the subject E. 

cloacae polypeptide. Anti-protein/anti-peptide antisera or monoclonal antibodies can 

25 be made by standard protocols (See, for example, Antibodies: A Laboratory Manual 

ed. by Harlow and Lane (Cold Spring Harbor Press: 1988)). A mammal such as a 
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mouse, a hamster or rabbit can be immunized with an immunogenic form of the 
peptide. Techniques for conferring immunogenicity on a protein or peptide include 
conjugation to carriers or other techniques well known in the art. An immunogenic 
portion of the subject E. cloacae polypeptide can be administered in the presence of 
5 adjuvant. The progress of immunization can be monitored by detection of antibody 
titers in plasma or serum. Standard ELISA or other immunoassays can be used with 
the immunogen as antigen to assess the levels of antibodies. 

In a preferred embodiment, the subject antibodies are immunospecific for 
antigenic determinants of the E. cloacae polypeptides of the invention, e.g. antigenic 

10 determinants of a polypeptide of the invention contained in the Sequence Listing, or a 
closely related human or non-human mammalian homolog (e.g., 90% homologous, 
more preferably at least about 95% homologous). In yet a further preferred 
embodiment of the invention, the anti-is. cloacae antibodies do not substantially cross 
react (i.e., react specifically) with a protein which is for example, less than 80% 

15 percent homologous to a sequence of the invention contained in the Sequence Listing. 
By "not substantially cross react", it is meant that the antibody has a binding affinity 
for a non-homologous protein which is less than 10 percent, more preferably less than 
5 percent, and even more preferably less than 1 percent, of the binding affinity for a 
protein of the invention contained in the Sequence Listing. In a most preferred 

20 embodiment, there is no cross-reactivity between bacterial and mammalian antigens. 

The term antibody as used herein is intended to include fragments thereof 
which are also specifically reactive with E. cloacae polypeptides. Antibodies can be 
fragmented using conventional techniques and the fragments screened for utility in the 
same manner as described above for whole antibodies. For example, F(ab')2 

25 fragments can be generated by treating antibody with pepsin. The resulting F(ab') 2 
fragment can be treated to reduce disulfide bridges to produce Fab* fragments. The 
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antibody of the invention is further intended to include bispecific and chimeric 
molecules having an anti-£. cloacae portion. 

Both monoclonal and polyclonal antibodies (Ab) directed against E. cloacae 
polypeptides or E. cloacae polypeptide variants, and antibody fragments such as Fab v 
5 and F(ab")2, can be used to block the action of E. cloacae polypeptide and allow the 
study of the role of a particular E. cloacae polypeptide of the invention in aberrant or 
unwanted intracellular signaling, as well as the normal cellular function of the E. 
cloacae and by microinjection of anti-E. cloacae polypeptide antibodies of the present 
invention. 

10 Antibodies which specifically bind E. cloacae epitopes can also be used in 

immunohistochemical staining of tissue samples in order to evaluate the abundance 
and pattern of expression of E. cloacae antigens. Anti-E. cloacae polypeptide 
antibodies can be used diagnostically in immuno-precipitation and immuno-blotting to 
detect and evaluate E. cloacae levels in tissue or bodily fluid as part of a clinical 

15 testing procedure. Likewise, the ability to monitor E. cloacae polypeptide levels in an 
individual can allow determination of the efficacy of a given treatment regimen for an 
individual afflicted with such a disorder. The level of an E. cloacae polypeptide can 
be measured in cells found in bodily fluid, such as in urine samples or can be 
measured in tissue, such as produced by gastric biopsy. Diagnostic assays using anti- 

20 E. cloacae antibodies can include, for example, immunoassays designed to aid in early 
diagnosis of E. cloacae infections. The present invention can also be used as a 
method of detecting antibodies contained in samples from individuals infected by this 
bacterium using specific E. cloacae antigens. 

Another application of anti-£. cloacae polypeptide antibodies of the invention 

25 is in the immunological screening of cDNA libraries constructed in expression vectors 
such as Xgtll, A.gtl8-23, A.ZAP, and XORF8. Messenger libraries of this type, 
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having coding sequences inserted in the correct reading frame and orientation, can 
produce fusion proteins. For instance, Xgtll will produce fusion proteins whose 
amino termini consist of 6-galactosidase amino acid sequences and whose carboxy 
termini consist of a foreign polypeptide. Antigenic epitopes of a subject E. cloacae 

5 polypeptide can then be detected with antibodies, as, for example, reacting 

nitrocellulose filters lifted from infected plates with anti-£. cloacae polypeptide 
antibodies. Phage, scored by this assay, can then be isolated from the infected plate. 
Thus, the presence of E. cloacae gene homologs can be detected and cloned from 
other species, and alternate isoforms (including splicing variants) can be detected and 

10 cloned. 

Kits Containing Nucleic Acids, Polypeptides or Antibodies of the Invention 
The nucleic acid, polypeptides and antibodies of the invention can be 
combined with other reagents and articles to form kits. Kits for diagnostic purposes 

15 typically comprise the nucleic acid, polypeptides or antibodies in vials or other 

suitable vessels. Kits typically comprise other reagents for performing hybridization 
reactions, polymerase chain reactions (PCR), or for reconstitution of lyophilized 
components, such as aqueous media, salts, buffers, and the like. Kits may also 
comprise reagents for sample processing such as detergents, chaotropic salts and the 

20 like. Kits may also comprise immobilization means such as particles, supports, wells, 
dipsticks and the like. Kits may also comprise labeling means such as dyes, 
developing reagents, radioisotopes, fluorescent agents, luminescent or 
chemiluminescent agents, enzymes, intercalating agents and the like. With the nucleic 
acid and amino acid sequence information provided herein, individuals skilled in art 

25 can readily assemble kits to serve their particular purpose. Kits further can include 
instructions for use. 

78 



Bio chip Technology 



The nucleic acid sequence of the present invention may be used to detect E. 
cloacae or other species of Enterobacter acid sequence using bio chip technology. Bio 

5 chips containing arrays of nucleic acid sequence can also be used to measure 

expression of genes of E. cloacae or other species of Enterobacter. For example, to 
diagnose a patient with a E. cloacae or other Enterobacter infection, a sample from a 
human or animal can be used as a probe on a bio chip containing an array of nucleic 
acid sequence from the present invention. In addition, a sample from a disease state 

10 can be compared to a sample from a non-disease state which would help identify a 
gene that is up-regulated or expressed in the disease state. This would provide 
valuable insight as to the mechanism by which the disease manifests. Changes in 
gene expression can also be used to identify critical pathways involved in drug 
transport or metabolism, and may enable the identification of novel targets involved in 

15 virulence or host cell interactions involved in maintenance of an infection. Procedures 
using such techniques have been described by Brown et al. y 1995, Science 270: 467- 
470. 

Bio chips can also be used to monitor the genetic changes of potential 
therapeutic compounds including, deletions, insertions or mismatches. Once the 

20 therapeutic is added to the patient, changes to the genetic sequence can be evaluated 
for its efficacy. In addition, the nucleic acid sequence of the present invention can be 
used to determine essential genes in cell cycling. As described in Iyer et a/., 1999 
{Science, 283:83-87 ) genes essential in the cell cycle can be identified using bio 
chips. Furthermore, the present invention provides nucleic acid sequence which can 

25 be used with bio chip technology to understand regulatory networks in bacteria, 
measure the response to environmental signals or drugs as in drug screening, and 
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study virulence induction. (Mons et aL 9 1998, Nature Biotechnology, 16: 45-48. 
Patents teaching this technology include U.S. Patents 5445934, 5744305, and 5800992. 

Drug Screening Assays Using E. cloacae Polypeptides 
5 By making available purified and recombinant E. cloacae polypeptides, the 

present invention provides assays which can be used to screen for drugs which are 
either agonists or antagonists of the normal cellular function, in this case, of the 
subject E. cloacae polypeptides, or of their role in intracellular signaling. Such 
inhibitors or potentiators may be useful as new therapeutic agents to combat E. 

10 cloacae infections in humans. A variety of assay formats will suffice and, in light of 
the present inventions, will be comprehended by the person skilled in the art. 

In many drug screening programs which test libraries of compounds and 
natural extracts, high throughput assays are desirable in order to maximize the number 
of compounds surveyed in a given period of time. Assays which are performed in 

15 cell-free systems, such as may be derived with purified or semi-purified proteins, are 
often preferred as "primary" screens in that they can be generated to permit rapid 
development and relatively easy detection of an alteration in a molecular target which 
is mediated by a test compound. Moreover, the effects of cellular toxicity and/or 
bioavailability of the test compound can be generally ignored in the in vitro system, 

20 the assay instead being focused primarily on the effect of the drug on the molecular 
target as may be manifest in an alteration of binding affinity with other proteins or 
change in enzymatic properties of the molecular target. Accordingly, in an exemplary 
screening assay of the present invention, the compound of interest is contacted with an 
isolated and purified E. cloacae polypeptide. 

25 Screening assays can be constructed in vitro with a purified E. cloacae 

polypeptide or fragment thereof, such as an E. cloacae polypeptide having enzymatic 
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activity, such that the activity of the polypeptide produces a detectable reaction 
product. The efficacy of the compound can be assessed by generating dose response 
curves from data obtained using various concentrations of the test compound. 
Moreover, a control assay can also be performed to provide a baseline for comparison. 
5 Suitable products include those with distinctive absorption, fluorescence, or chemi- 
luminescence properties, for example, because detection may be easily automated. A 
variety of synthetic or naturally occurring compounds can be tested in the assay to 
identify those which inhibit or potentiate the activity of the E. cloacae polypeptide. 
Some of these active compounds may directly, or with chemical alterations to promote 
10 membrane permeability or solubility, also inhibit or potentiate the same activity (e.g., 
enzymatic activity) in whole, live E. cloacae cells. 

Overexpression Assays 

Overexpression assays are based on the premise that overproduction of a 

15 protein would lead to a higher level of resistance to compounds that selectively 
interfere with the function of that protein. Overexpression assays may be used to 
identify compounds that interfere with the function of virtually any type of protein, 
including without limitation enzymes, receptors, DNA- or RNA-binding proteins, or 
any proteins that are directly or indirectly involved in regulating cell growth. 

20 Typically, two bacterial strains are constructed. One contains a single copy of 

the gene of interest, and a second contains several copies of the same gene. 
Identification of useful inhibitory compounds of this type of assay is based on a 
comparison of the activity of a test compound in inhibiting growth and/or viability of 
the two strains. The method involves constructing a nucleic acid vector that directs 

25 high level expression of a particular target nucleic acid. The vectors are then 

transformed into host cells in single or multiple copies to produce strains that express 
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low to moderate and high levels of protein encoding by the target sequence (strain A 
and B, respectively). Nucleic acid comprising sequences encoding the target gene 
can, of course, be directly integrated into the host cell. 

Large numbers of compounds (or crude substances which may contain active 

5 compounds) are screened for their effect on the growth of the two strains. Agents 
which interfere with an unrelated target equally inhibit the growth of both strains. 
Agents, which interfere with the function of the target at high concentration should 
inhibit the growth of both strains. It should be possible, however, to titrate out the 
inhibitory effect of the compound in the overexpressing strain. That is, if the 

10 compound is affecting the particular target that is being tested, it should be possible to 
inhibit the growth of strain A at a concentration of the compound that allows strain B 
to grow. 

Alternatively, a bacterial strain is constructed that contains the gene of interest 
under the control of an inducible promoter. Identification of useful inhibitory agents 

15 using this type of assay is based on a comparison of the activity of a test compound in 
inhibiting growth and/or viability of this strain under both inducing and non-inducing 
conditions. The method involves constructing a nucleic acid vector that directs high- 
level expression of a particular target nucleic acid. The vector is then transformed 
into host cells that are grown under both non-inducing and inducing conditions 

20 (conditions A and B, respectively). 

Large numbers of compounds (or crude substances which may contain active 
compounds) are screened for their effect on growth under these two conditions. 
Agents that interfere with the function of the target should inhibit growth under both 
conditions. It should be possible, however, to titrate out the inhibitory effect of the 

25 compound in the overexpressing strain. That is, if the compound is affecting the 
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particular target that is being tested, it should be possible to inhibit growth under 
condition A at a concentration that allows the strain to grow under condition B. 

Ligand-binding Assays 
5 Many of the targets according to the invention have functions that have not yet 

been identified. Ligand-binding assays are useful to identify inhibitor compounds that 
interfere with the function of a particular target, even when that function is unknown. 
These assays are designed to detect binding of test compounds to particular targets. 
The detection may involve direct measurement of binding. Alternatively, indirect 
10 indications of binding may involve stabilization of protein structure or disruption of a 
biological function. Non-limiting examples of useful ligand-binding assays are 
detailed below. 

A useful method for the detection and isolation of binding proteins is the 
Biomolecular Interaction Assay (BIAcore) system developed by Pharmacia Biosensor 
15 and described in the manufacturer's protocol (LKB Pharmacia, Sweden). The BIAcore 
system uses an affinity purified anti-GST antibody to immobilize GST-fusion proteins 
onto a sensor chip. The sensor utilizes surface plasmon resonance which is an optical 
phenomenon that detects changes in refractive indices. In accordance with the 
practice of the invention, a protein of interest is coated onto a chip and test 
20 compounds are passed over the chip. Binding is detected by a change in the 
refractive index (surface plasmon resonance). 

A different type of ligand-binding assay involves scintillation proximity assays 
(SPA, described in U.S. Patent No. 4,568,649). 

Another type of ligand binding assay, also undergoing development, is based 
25 on the fact that proteins containing mitochondrial targeting signals are imported into 
isolated mitochondria in vitro (Hurt et a/., 1985, Embo J. 4:2061-2068; Eilers and 
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Schatz, Nature, 1986, 322:228-231). In a mitochondrial import assay, expression 
vectors are constructed in which nucleic acids encoding particular target proteins are 
inserted downstream of sequences encoding mitochondrial import signals. The 
chimeric proteins are synthesized and tested for their ability to be imported into 

5 isolated mitochondria in the absence and presence of test compounds. A test 
compound that binds to the target protein should inhibit its uptake into isolated 
mitochondria in vitro. 

Another ligand-binding assay is the yeast two-hybrid system (Fields and Song, 
1989, Nature 340:245-246). The yeast two-hybrid system takes advantage of the 

10 properties of the GAM protein of the yeast Saccharomyces cerevisiae. The GAL4 
protein is a transcriptional activator required for the expression of genes encoding 
enzymes of galactose utilization. This protein consists of two separable and 
functionally essential domains: an N-terminal domain which binds to specific DNA 
sequences (UAS G ); and a C-terminal domain containing acidic regions, which is 

15 necessary to activate transcription. The native GAL4 protein, containing both 
domains, is a potent activator of transcription when yeast are grown on galactose 
media. The N-terminal domain binds to DNA in a sequence-specific manner but is 
unable to activate transcription. The C-terminal domain contains the activating 
regions but cannot activate transcription because it fails to be localized to UASq. In 

20 the two-hybrid system, a system of two hybrid proteins containing parts of GAL4: (I) 
a GAL4 DNA-binding domain fused to a protein 'X* and (2) a GAL4 activation region 
fused to a protein 'Y'. If X and Y can form a protein-protein complex and reconstitute 
proximity of the GAL4 domains, transcription of a gene regulated by UAS G occurs. 
Creation of two hybrid proteins, each containing one of the interacting proteins X and 

25 Y, allows the activation region of UAS G to be brought to its normal site of action. 
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The binding assay described in Fodor et al. 9 1991, Science 251:767-773, which 
involves testing the binding affinity of test compounds for a plurality of defined 
polymers synthesized on a solid substrate, may also be useful. 

Compounds which bind to the polypeptides of the invention are potentially 
5 useful as antibacterial agents for use in therapeutic compositions. 

Pharmaceutical formulations suitable for antibacterial therapy comprise the 
antibacterial agent in conjunction with one or more biologically acceptable carriers. 
Suitable biologically acceptable carriers include, but are not limited to, phosphate- 
buffered saline, saline, deionized water, or the like. Preferred biologically acceptable 
f jj 10 carriers are physiologically or pharmaceutically acceptable carriers, 

ry The antibacterial compositions include an antibacterial effective amount of 

]«3 active agent. Antibacterial effective amounts are those quantities of the antibacterial 

agents of the present invention that afford prophylactic protection against bacterial 

Hi 

i l infections or which result in amelioration or cure of an existing bacterial infection. 

q 15 This antibacterial effective amount will depend upon the agent, the location and nature 

of the infection, and the particular host. The amount can be determined by 
experimentation known in the art, such as by establishing a matrix of dosages and 
frequencies and comparing a group of experimental units or subjects to each point in 
the matrix. 

20 The antibacterial active agents or compositions can be formed into dosage unit 

forms, such as for example, creams, ointments, lotions, powders, liquids, tablets, 
capsules, suppositories, sprays, aerosols or the like. If the antibacterial composition is 
formulated into a dosage unit form, the dosage unit form may contain an antibacterial 
effective amount of active agent. Alternatively, the dosage unit form may include less 

25 than such an amount if multiple dosage unit forms or multiple dosages are to be used 
to administer a total dosage of the active agent. Dosage unit forms can include, in 



addition, one or more excipient(s), diluent(s), disintegrant(s), lubricant(s), 

plasticizer(s), colorant(s), dosage vehicle(s), absorption enhancer(s), stabilizer(s), £f 

bactericide(s), or the like. 

For general information concerning formulations, see, e.g., Gilman et al. (eds.), 

5 1990, Goodman and Oilman's: The Pharmacological Basis of Therapeutics, 8th ed., 
Pergamon Press; and Remington's Pharmaceutical Sciences, 17th ed., 1990, Mack 
Publishing Co.,vEaston, PA; Avis et al. (eds.), 1993, Pharmaceutical Dosage Forms: 
Parenteral Medications, Dekker, New York; Lieberman et al (eds.), 1990, 
Pharmaceutical Dosage Forms: Disperse Systems, Dekker, New York. 

10 The antibacterial agents and compositions of the present invention are useful 

for preventing or treating E. cloacae infections. Infection prevention methods 
incorporate a prophylactically effective amount of an antibacterial agent or 
composition. A prophylactically effective amount is an amount effective to prevent E. 
cloacae infection and will depend upon the specific bacterial strain, the agent, and the 

15 host. These amounts can be determined experimentally by methods known in the art 
and as described above. 

E. cloacae infection treatment methods incorporate a therapeutically effective 
amount of an antibacterial agent or composition. A therapeutically effective amount is 
an amount sufficient to ameliorate or eliminate the infection. The prophylactically 

20 and/or therapeutically effective amounts can be administered in one administration or 
over repeated administrations. Therapeutic administration can be followed by 
prophylactic administration, once the initial bacterial infection has been resolved. 

The antibacterial agents and compositions can be administered topically or 
systemically. Topical application is typically achieved by administration of creams, 

25 ointments, lotions, or sprays as described above. Systemic administration includes 
both oral and parental routes. Parental routes include, without limitation, 
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subcutaneous, intramuscular, intraperitoneal, intravenous, transdermal, inhalation and 
intranasal administration. 



EXEMPLIFICATION 

5 

Cloning and Sequencing E. cloacae Genomic Sequence 

This invention provides nucleotide sequences of the genome of E. cloacae 
which thus comprises a DNA sequence library of E. cloacae genomic DNA. The 

10 detailed description that follows provides nucleotide sequences of E. cloacae, and also 
describes how the sequences were obtained and how ORFs (Open Reading Frames) 
and protein-coding sequences can be identified. Also described are methods of using 
the disclosed E. cloacae sequences in methods including diagnostic and therapeutic 
applications. Furthermore, the library can be used as a database for identification and 

15 comparison of medically important sequences in this and other strains of E. cloacae as 
well as other species of Enterobacter . 

Chromosomal DNA from strain 15842 of E. cloacae was isolated after 
Zymolyase digestion, sodium dodecyl sulfate lysis, potassium acetate precipitation, 
phenolxhloroform extraction and ethanol precipitation (Soil, D.R., T. Srikantha and 

20 S.R. Lockhart: Characterizing Developmentally Regulated Genes in E. cloacae. In 
Microbial Genome Methods. K.W. Adolph, editor. CRC Press. New York, p 17-37.). 
Genomic E. cloacae DNA was hydrodynamically sheared in an HPLC and then 
separated on a standard 1% agarose gel. Fractions corresponding to 2500-3000 bp in 
length were excised from the gel and purifed by the GeneClean procedure (Bio 101, 

25 Inc.). 
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The purified DNA fragments were then blunt-ended using T4 DNA 
polymerase. The healed DNA was then ligated to unique £s*XI-linker adapters (5'- 
GTCTTCACCACGGGG-3'^and 5 -GTGGTGA AGAC-3 in 100-1000 fold molar 
excess). These linkers are complimentary to the ifcrXI-cut pGTC vector, while the 
5 overhang is not self-complimentary. Therefore, the linkers will not concatermerize nor 
will the cut- vector religate itself easily. The linker- adapted inserts were separated from 
the unincorporated linkers on a 1% agarose gel and purified using GeneClean. The 
linker-adapted inserts were then ligated to BstXI-cut vector to construct a "shotgun" 
sublclone libraries. 

10 Only major modifications to the protocols are highlighted. Briefly, the library 

was then transformed into DH5S competent cells (Gibco/BRL, DH52 transformation 
protocol). It was assessed by plating onto antibiotic plates containing ampicillin and 
IPTG/Xgal. The plates were incubated overnight at 37°C. Transformants were then 
used for plating of clones and picking for sequencing. The cultures were grown 

15 overnight at 37°C. DNA was purified using a silica bead DNA preparation 

(Engelstein, 1996) method. In this manner, 25 \xg of DNA was obtained per clone. 

These purified DNA samples were then sequenced using primarily ABI dye- 
terminator chemistry. All subsequent steps were based on sequencing by ABI377 
automated DNA sequencing methods. The ABI dye terminator sequence reads were 

20 run on ABI377 machines and the data was transferred to UNIX machines following 
lane tracking of the gels. Base calls and quality scores were determined using the 
program PHRED (Ewing et al., 1998, Genome Res. 8: 175-185; Ewing and Green, 
1998, Genome Res. 8: 685-734). Reads were assembled using PHRAP (P. Green, 
Abstracts of DOE Human Genome Program Contractor-Grantee Workshop V, Jan. 

25 1996, p. 157) with default program parameters and quality scores. The initial assembly 
was done at 6-fold coverage and yielded 513 contigs. 

88 



Finishing can follow the initial assembly. Missing mates (sequences from 
clones that only gave reads from one end of the Enterobacter DNA inserted in the 
plasmid) can be identified and sequenced with ABI technology to allow the 
identification of additional overlapping contigs. 
5 End-sequencing of randomly picked genomic lambda was also 

performed. Sequencing on a both sides was done for all lambda sequences. The 
lambda library backbone helped to verify the integrity of the assembly and allowed 
closure of some of the physical gaps. Primers for walking off the ends of contigs 
would be selected using pick_primer (a GTC program) near the ends of the clones to 
10 facilitate gap closure. These walks can be sequenced using the selected clones and 
primers. These data are then reassembled with PHRAP. Additional sequencing using 
PCR-generated templates and screened and/or unscreened lambda templates can be 
done in addition. 

To identify E. cloacae polypeptides the complete genomic sequence of E. 

15 cloacae were analyzed essentially as follows: First, all possible stop-to- stop open 
reading frames (ORFs) greater than 180 nucleotides in all six reading frames were 
translated into amino acid sequences. Second, the identified ORFs were analyzed for 
homology to known (archeabacter, prokaryotic and eukaryotic) protein sequences. 
Third, the coding potential of non-homologous sequences were evaluated with the 

20 program GENEMARKTM (Borodovsky and Mclninch, 1993, Comp. Chem. 17:123). 

Identification, Cloning and Expression of E.cloacae Nucleic Acids 

Expression and purification of the E.cloacae polypeptides of the invention can 
25 be performed essentially as outlined below. 
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To facilitate the cloning, expression and purification of membrane and secreted 
proteins from E.cloacae, a gene expression system, such as the pET System 
(Novagen), for cloning and expression of recombinant proteins in E. coli, is selected. 
Also, a DNA sequence encoding a peptide tag, the His-Tag, is fused to the 3' end of 
5 DNA sequences of interest in order to facilitate purification of the recombinant protein 
products. The 3' end is selected for fusion in order to avoid alteration of any 5' 
terminal signal sequence. 



. !B% PCR Amplification and Cloning of Nucleic Acids Containing ORF's Encoding 

l=y 10 Enzymes 

s.i = 

; 3 Nucleic acids chosen (for example, from the nucleic acids set forth in SEQ ID 

;U NO: 1 - SEQ ID NO: 5662 for cloning from the 15842 strain of E.cloacae are 

n i 

\1 prepared for amplification cloning by polymerase chain reaction (PCR). Synthetic 

i ; 3 15 oligonucleotide primers specific for the 5 7 and 3 7 ends of open reading frames 

(ORFs) are designed and purchased from GibcoBRL Life Technologies (Gaithersburg, 
MD, USA). All forward primers (specific for the 5 7 end of the sequence) are designed 
to include an Ncol cloning site at the extreme 5 / terminus. These primers are 
designed to permit initiation of protein translation at a methionine residue followed by 
20 a valine residue and the coding sequence for the remainder of the native E.cloacae 
DNA sequence. All reverse primers (specific for the 3 ; end of any E.cloacae ORF) 
include a EcoRI site at the extreme 5 1 terminus to permit cloning of each E.cloacae 
sequence into the reading frame of the pET-28b. The pET-28b vector provides 
sequence encoding an additional 20 carboxy-terminal amino acids including six 
25 histidine residues (at the extreme C-terminus), which comprise the His-Tag. 
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Genomic DNA prepared from the 15842 strain of E.cloacae is used as the 
source of template DNA for PCR amplification reactions (Current Protocols in 
Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). To 
amplify a DNA sequence containing an E.cloacae ORF, genomic DNA (50 

5 nanograms) is introduced into a reaction vial containing 2 mM MgC^, 1 micromolar 
synthetic oligonucleotide primers (forward and reverse primers) complementary to and 
flanking a defined Kcloacae ORF, 0.2 mM of each deoxynucleotide triphosphate; 
dATP, dGTP, dCTP, dTTP and 2.5 units of heat stable DNA polymerase (Amplitaq, 
Roche Molecular Systems, Inc., Branchburg, NJ, USA) in a final volume of 100 

10 microliters. 

Upon completion of thermal cycling reactions, each sample of amplified DNA 
is washed and purified using the Qiaquick Spin PCR purification kit (Qiagen, 
Gaithersburg, MD, USA). All amplified DNA samples are subjected to digestion with 
the restriction endonucleases, e.g., Ncol and EcoRI (New England BioLabs, Beverly, 

15 MA, USA)(Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. 

Ausubel et al., eds., 1994). DNA samples are then subjected to electrophoresis on 1.0 
% NuSeive (FMC BioProducts, Rockland, ME USA) agarose gels. DNA is visualized 
by exposure to ethidium bromide and long wave uv irradiation. DNA contained in 
slices isolated from the agarose gel is purified using the Bio 101 GeneClean Kit 

20 protocol (Bio 101 Vista, CA, USA). 



Cloning of E. cloacae Nucleic Acids Into an Expression Vector 



The pET-28b vector is prepared for cloning by digestion with restriction 
25 endonucleases, e.g., Ncol and EcoRI (Current Protocols in Molecular Biology, John 
Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). The pET-28a vector, which 
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encodes a His-Tag that can be fused to the 5' end of an inserted gene, is prepared by 
digestion with appropriate restriction endonuc leases. 

Following digestion, DNA inserts are cloned (Current Protocols in Molecular 
Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) into the previously 
5 digested pET-28b expression vector. Products of the ligation reaction are then used to 
transform the BL21 strain of E. coli (Current Protocols in Molecular Biology, John 
Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) as described below. 

Transformation Of Competent Bacteria With Recombinant Plasmids 

10 

Competent bacteria, E coli strain BL21 or E. coli strain BL21(DE3), are 
transformed with recombinant pET expression plasmids carrying the cloned E. cloacae 
sequences according to standard methods (Current Protocols in Molecular, John Wiley 
and Sons, Inc., F. Ausubel et al., eds., 1994). Briefly, 1 microliter of ligation reaction 

15 is mixed with 50 microliters of electrocompetent cells and subjected to a high voltage 
pulse, after which, samples are incubated in 0.45 milliliters SOC medium (0.5% yeast 
extract, 2.0 % tryptone, 10 raM NaCl, 2.5 raM KC1, 10 mM MgC12, 10 mM MgS04 
and 20, mM glucose) at 37°C with shaking for 1 hour. Samples are then spread on 
LB agar plates containing 25 microgram/ml kanamycin sulfate for growth overnight. 

20 Transformed colonies of BL21 are then picked and analyzed to evaluate cloned inserts 
as described below. 

Identification Of Recombinant Expression Vectors With E. cloacae Nucleic Acids 



25 



Individual BL21 clones transformed with recombinant pET-28b E. cloacae 
ORFs are analyzed by PCR amplification of the cloned inserts using the same forward 
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and reverse primers, specific for each E. cloacae sequence, that were used in the 
original PCR amplification cloning reactions. Successful amplification verifies the 
integration of the E, cloacae sequences in the expression vector (Current Protocols in 
Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et aL, eds., 1994). 

Isolation and Preparation of Nucleic Acids From Transformants 

Individual clones of recombinant pET-28b vectors carrying properly cloned E. 
cloacae ORFs are picked and incubated in 5 mis of LB broth plus 25 microgram/ml 
kanamycin sulfate overnight. The following day plasmid DNA is isolated and purified 
using the Qiagen plasmid purification protocol (Qiagen Inc., Chatsworth, CA, USA). 

Expression Of Recombinant E. cloacae Sequences In E. coli 

The pET vector can be propagated in any E. coli K-12 strain e.g. HMS174, 
HB101, JM109, DH5, etc. for the purpose of cloning or plasmid preparation. Hosts 
for expression include E, coli strains containing a chromosomal copy of the gene for 
T7 RNA polymerase. These hosts are lysogens of bacteriophage DE3, a lambda 
derivative that carries the lad gene, the lacUV5 promoter and the gene for T7 RNA 
polymerase. T7 RNA polymerase is induced by addition of isopropyl-B-D- 
thiogalactoside (IPTG), and the T7 RNA polymerase transcribes any target plasmid, 
such as pET-28b, carrying its gene of interest Strains used include: BL21(DE3) 
(Studier, F.W., Rosenberg, A.H., Dunn, J.J., and Dubendorff, J.W. (1990) Meth. 
Enzymol. 185, 60-89). 

To express recombinant E. cloacae sequences, 50 nanograms of plasmid DNA 
isolated as described above is used to transform competent BL21(DE3) bacteria as 
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described above (provided by Novagen as part of the pET expression system kit). 

The lacZ gene (beta-galactosidase) is expressed in the pET-System as described for 

the E. cloacae recombinant constructions. Transformed cells are cultured in SOC 

medium for 1 hour, and the culture is then plated on LB plates containing 25 

5 micrograms/ml kanamycin sulfate. The following day, bacterial colonies are pooled 

and grown in LB medium containing kanamycin sulfate (25 micrograms/ml) to an 

optical density at 600 nM of 0.5 to 1.0 O.D. units, at which point, 1 millimolar IPTG 

was added to the culture for 3 hours to induce gene expression of the E. cloacae 

recombinant DNA constructions . 

10 After induction of gene expression with IPTG, bacteria are pelleted by 

o 

centrifugation in a Sorvall RC-3B centrifuge at 3500 x g for 15 minutes at 4 C. 
Pellets are resuspended in 50 milliliters of cold 10 mM Tris-HCl, pH 8.0, 0.1 M NaCl 
and 0.1 mM EDTA (STE buffer). Cells are then centrifuged at 2000 x g for 20 min 
at 4°C. Wet pellets are weighed and frozen at -80°C until ready for protein 
15 purification. 

A variety of methodologies known in the art can be utilized to purify the 
isolated proteins. (Current Protocols in Protein Science, John Wiley and Sons, Inc., J. 
E. Coligan et aL, eds., 1995). For example, the frozen cells may be thawed, 
resupended in buffer and ruptured by several passages through a small volume 

20 microfluidizer (Model M-110S, Microfluidics International Corporation, Newton, MA). 
The resultant homogenate may be centrifuged to yield a clear supernatant (crude 
extract) and following filtration the crude extract may be fractionated over columns. 
Fractions may be monitored by absorbance at OD 2 go nm. and peak fractions may 
analyzed by SDS-PAGE 

25 The concentrations of purified protein preparations may be quantified 

spectrophotometrically using absorbance coefficients calculated from amino acid 



content (Perkins, SJ. 1986 Eur. J. Biochem. 157, 169-180). Protein concentrations are 
also measured by the method of Bradford, M.M. (1976) Anal. Biochem. 72, 248-254, 
and Lowry, O.H., Rosebrough, N., Farr, A.L. & Randall, RJ. (1951) J. Biol. Chem. 
193, pages 265-275, using bovine serum albumin as a standard. 
5 SDS-polyacrylamide gels of various concentrations may be purchased from 

BioRad (Hercules, CA, USA), and stained with Coomassie blue. Molecular weight 
markers may include rabbit skeletal muscle myosin (200 kDa), E. coli (-galactosidase 
(116 kDa), rabbit muscle phosphorylase B (97.4 kDa), bovine serum albumin (66.2 
. !S _ kDa), ovalbumin (45 kDa), bovine carbonic anhydrase (31 kDa), soybean trypsin 

10 inhibitor (21.5 kDa), egg white lysozyme (14.4 kDa) and bovine aprotinin (6.5 kDa). 
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EQUIVALENTS 

Those skilled in the art will recognize, or be able to ascertain using no more 
5 than routine experimentation, many equivalents to the specific embodiments and 

methods described herein. The specific embodiments described herein are offered by 
way of example only, and the invention is to limited only by the terms of the 
appended claims, along with the full scope of equivalents to which such claims are 
entitled. 
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[pn : asparaginy 1 -trna synthetase] 
[gn:asns] 


[pn:50s ribosomal subunit protein 
Hl][gn:rplk] 


[pn:50s ribosomal subunit protein 11] 
[gn:rpla] 


[pn:50s ribosomal subunit protein 
110] [gn:rplj] 


[pn:50s ribosomal subunit protein 
114] [gn:rpln] 


[pn:50s ribosomal subunit protein 15] 
[gn:rple] 


[pn:50s ribosomal subunit protein 
124] [gn:rplx] 


[pn:ornithine decarboxylase, 
constitutive] [gn:spec] 


[pn:hypothetical protein] [gn.yqga] 


[pnrtype 4 prepilin-like protein 
specific leader peptidase] [gn:hofd] 


[pn:bacterioferritin] [gn:bfr] 


[pnxell division protein ftsa] 
[gn:ftsa] 


[pnxell division protein ftsq] 
[gn:ftsq] 


[pn:stringent starvation protein] 
[gn:sspa] 


[pn: stringent starvation protein b] 
[gn:sspb] 


[pn: hypothetical protein] 


[pn:hypothetical protein] [gn:sola] 


[pn:30s ribosomal subunit protein 
si 2] [gn:rpsl] 


[pn:30s ribosomal subunit protein 
s7] [gn:rpsg] 
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increased glyphosate resistance 
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[pnrhypothetical transcriptional 
regulator in lipa- lipb intergenic 
region] [gn:ybef] 


[pn:lipoic acid synthetase] [gn:lipa] 


[pn;signal peptidase i] [gn:lepb] 


[PN:TrbC] [GN:trbC] 
[DE:Escherichia coli plasmid R100- 
1 TraV (traV), TraR (traR), 
OrfGl(orfGl), OrfH (orfH), Orfl 
(orfl), TraC (traC), Trbl (trbl), 
TraW(traW), TraU (traU), TrbC 
(trbC), TraN (traN), TrbE (trbE) and 
TraF(traF) genes, c 


[PN:protease] [GN:prtV] 
[DE:Vibrio cholerae DNA for hlyA, 
hlyB, lipA, lipB and prtV genes.] 
[LE:7537] [RE: 10296] 
[DIxomplement] 


[pn: hypothetical protein] 


[pn: hypothetical protein] 


[pn: magnesium and cobalt transport 
protein cora] [gnxora] 


[pmmolybdenum transport protein 
mode] [gn:mode] 


[pn:putative molybdenum transport 
atp-binding protein modf] [gn:modfj 


[pn:hypothetical protein] 


[pn:ybgb] [gn:ybgg] 


[pn:multidrug transporter homolog] 
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hypothetical 29.3 kd protein in 
region 2 of sym plasmid (no 1265). 


[pn:hypothetical protein] [gn:ytfb] 


[pn:50s ribosomal subunit protein 19] 
[gn:rpli] 


[pn:30s ribosomal subunit protein 
si 8] [gn:rpsr] 


[pn:hypothetical 28.7 kd protein in 
marb-dcp intergenic region] 
[gn:yded] 


[pn:dna polymerase iii, delta"" 
subunit] [gn:holb] 


[pn;hypothetical protein in holb-ptsg 
intergenic region] [gn:ycfh] 


[pn:pts system, glucose-specific iibc 
component] [gn:ptsg] 


[pn:protease iv] 


[pn:unknown] 


[pn:rna polymerase sigma subunit 
rpos] [gn:rpos] 


[pn:lipoprotein nlpd precursor] 
[gn:nlpd] 


[pn:insertion element is91 1 
hypothetical 1 1.6 kd protein] 
[gn:yi91] 


dna primase, chains a and b (ec 
2.7.7.-). 


[pn:hypothetical protein] 


[pn:rare lipoprotein a precursor] 
[gn:rlpa] 
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[pn conserved hypothetical protein] 


[pn:aerobic respiration control 
protein area] [gn:arca] 


[pn: hypothetical 25.3 kd protein in 
arca-thrl intergenic region] [gn:last] 


[pnxrea protein] [gnxrea] 


or: bacteriophage 186 le: 181 re:71 1 
dirdirect nt:or08; similar to 
bacteriophage p2 i protein, 


probable tail fibre protein (gph). 


[pn : hypothetical phosphotransferase 
enzyme ii] [gn:ptxa] 


[pn hypothetical 23.6 kd protein in 
aidb-rpsf intergenic region] [gn:yjfv] 


[pn hypothetical 32.0 kd protein in 
aidb-rpsf intergenic region] 


[pn hypothetical 52.9 kd protein in 
aidb-rpsf intergenic region] [gn:yjfs] 


[pn hypothetical 10.9 kd protein in 
aidb-rpsf intergenic region] [gn:yjft] 


[pn hypothetical phosphotransferase 
enzyme ii] [gn:ptxa] 


[pn hypothetical protein] [gn:sgae] 


[pn hypothetical 25.6 kd protein in 
dnat-hold intergenic region] [gn:yjjr] 


[pn hypothetical protein] [gn:ylab] 


[pn:lipoprotein-28 precursor] 
[gn:nlpa] 


[pn:lipoprotein-28 precursor] 
[gn;nlpa] 


hypothetical protein 137 - maize 
chloroplast 


[pn:50s ribosomal subunit protein 14] 
[gn:rpld] 
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[pn:30s ribosomal subunit protein 
s3] [gn:rpsc] 


[pn:50s ribosomal subunit protein 
116] [gn:rplp] 


[pn:50s ribosomal subunit protein 12] 
[gn:rplb] 


[pn:50s ribosomal subunit protein 
122] [gn:rplv] 


[pn:30s ribosomal subunit protein 
sl7] [gn:rpsq] 
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[pn;pts system, fructose-specific 
iia/fpr component] 
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[pn:hypothetical protein, 13.1k] 
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[pn: hypothetical protein] 


[pn:pts system, n-acetylglucosamine- 
specific iiabc component] [gn:nage] 


mucin - human (fragment) | 
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[pn:hypothetical protein] [gn:yhcf] 


[pn: hypothetical protein] 
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pabc-holb intergenic region] 
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subunit] [gn:holb] 
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[pn : integrase/recombinase] 
[gn:ykqm] 


dna primase trac (ec 2.7.7.-) 
(replication primase). 


or:enterobacter aerogenes pn:tnpa 
gn:tnpa le:l3384 re:14388 di:direct 


onescherichia coli le:627 re: 1199 
dixomplement srxscherichia coli 
dna nt:orf 4; putative 


[pn;hypothetical 15.6 kd protein in 
cydb-tolq intergenic region] 
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[pn: pantothenate kinase] [gnxoaa] 
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[pn:dipeptide transport system 
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[pn:l-arabinose isomerase] [gn:arae] 
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[pn:d-galactose-binding periplasmic 
protein precursor] [gmmglb] 


[pn:hypothetical 31.3 kd protein in 
fole-cira intergenic region] [gn:yeig] 


or:synechococcus pcc7942 
pn:unknown le:3661 re:4344 
dixomplement nt:orf227 


[pnxolicin i receptor precursor] 
[gnxira] 


[pn:3 1.1 kd protein in msbb-ruvb 
intergenic region] [gn:yebl] 


[pn: hypothetical 46.7 kd protein in 
msbb-ruvb intergenic region] 
[gn:yeba] 


[pnimsbb protein] [gn:msbb] 


[pn:msbb protein] [gn;msbb] 


[pn:glucose 6-phosphate 1 
dehydrogenase] [gn:zwf] 


[pn:glucose 6-phosphate 1 
dehydrogenase] [gn:zwf] 


[pn:hypothetical 32.0 kd protein in 
pyka-zwf intergenic region] 
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[pn:pyruvate kinase a] [gn:pyka] 
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[pn:hypothetical protein] 


orxorynebacterium glutamicum 
pn:unknown gn:orf6 le:3532 re:4290 
dixomplement 


[pn; hypothetical protein in ribc 
5""region] [gn:ydhe] 


[pn:purine nucleotide synthesis 
repressor] [gn:purr] 


[pn;hypothetical 43.4 kd protein in 
purr-cfa intergenic region] [gn:ydhc] 


[pn xyclopropane-fatty-acy 1- 
phospholipid synthase] [gnxfa] 


colicin v secretion atp-binding 
protein cvab. 


[pn:hypothetical 20.1 kd protein in 
intf-eaeh intergenic region precursor] 
[gn:yagz] 


[pn: hypothetical 24.5 kd protein in 
intf-eaeh intergenic region precursor] 
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[pn:hypothetical 91.2 kd protein in 
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[pn:hypothetical 60.0 kd protein in 
intf-eaeh intergenic region] 
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[pn:hypothetical 28.2 kd protein in 
intf-eaeh intergenic region] 


[pn:ada regulatory protein] [gn:ada] 


[pn [hypothetical abc transporter in 
eco-alkb intergenic region] [gn:yoji] 


[pn:outer membrane protein c 
precursor] [gn:ompc] 


[pn:hypothetical 38,5 kd protein in 
ada-ompc intergenic region] 
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[pn:l-arabinose-binding periplasmic 
protein precursor] [gn:araf] 


[pn:ferritin-like protein] [gn:yeci] 


[pn:anaerobic c4-dicarboxylate 
transporter dcub] [gn:dcub] 


[pn:hypothetical 51.5 kd protein in 
rbsr-rrsc intergenic region] [gn:yieo] 


[pn:hypothetical protein] 


[pn: hypothetical 37.8 kd protein in 
rply-prol intergenic region] [gn:yejk] 


[pn:hypotheticaI 25.9 kd protein in 
bcr-rply intergenic region] [gn:yejd] 


[pn:bicyclomycin resistance protein] 
[gn.bcr] 


[pn hypothetical 12.5 kd protein in 
bcr 5""region] [gn:yejg] 


protein p60 precursor (invasion- 
associated protein). 


[pn hypothetical 66.4 kd protein in 
rsua-rply intergenic region] [gniyejh] 


[pn: hypothetical 38.1 kd protein in 
bcr 5"" region] [gn:yeje] 


[pn: hypothetical 40.4 kd protein in 
bcr 5"" region] [gn:yejb] 


[pn: hypothetical abc transporter in 
bcr 5 ,M, region] [gn:yejf] 


hypothetical protein fwdl566 - 
escherichia coli 


very hypothetical 19.2 kd protein in 
bcr 3'region. 


[pn:50s ribosomal protein 125] 
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[pmxanthine guanine phosphoribosyl 
transferase gpt] [gn:hi0692] 


[pn;nadh] 


[pnihypothetical protein] 


[pnihypothetical protein dinp] 
[gn:dinp] 


[pn:nadh] 
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[pnihypothetical protein] 


[pn:aminoacyl-histidine dipeptidase 
precursor] [gn:pepd] 


[pniaminoacyl-histidine dipeptidase 
precursor] [gn:pepd] 


[pn:excision nuclease abc subunit b] 
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[pn:molybdopterin converting factor, 
subunit 1] [gn:moad] 
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[pn:molybdopterin converting factor, 
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[pn:molybdenum cofactor 
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[pn: hypothetical abc transporter atp- 
binding protein in pepn-pyrd 
intergenic region] [gn:ycbe] 
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[pnxold shock protein cspa] 
[gnxspa] 


[pn.hypothetical 26.0 kd protein in 
prok-tag intergenic region] [gn:yhjy] 
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[pn:hypothetical 1 1.4 kd protein in 
pyrf-osmb intergenic region] 
[gmycih] 


[pn:orotidine-5""-p decarboxylase] 
[gn;pyrf] 


[pn:hypothetical protein] [gn:ycdt] 


„rpff 


[pn:glutamate- 1 -semialdehyde 2, 1 - 
aminomutase] [gn:heml] 


[pn:peptidoglycan synthetase] 
[gn:mrcb] 


[pn:ferrichrome-iron receptor 
precursor] [gn:fhua] 


[pn:ferrichrome transport protein 
fhub precursor] [gn:fhub] 


[pn: hypothetical protein in heml-pfs 
intergenic region] [gn:yadq] 


[pmferrichrome-binding periplasmic 
protein precursor] [gn:fhud] 


[pn:ferrichrome transport atp- 
binding protein fhuc] [gn:fhuc] 


[pn:vacb protein] [gnrvacb] 


[pn -hypothetical 26.6 kd protein in 
vacb-aidb intergenic region] 


[pn:adenylosuccinate synthetase] 
[gmpura] 


[pn:hypothetical 15.6 kd protein in 
pura-vacb intergenic region] 
[gmyjeb] 


[pmaidb protein] [gn:aidb] 


[pn:hypothetical protein] [gn:yjet] 


[pn:methyl-accepting chemotaxis 
protein ii] [gnrtar] 
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vacb protein. 


[sp:o05543] [de.hypothetical protein 
in adhs 5 M "region (orO) (fragment)] 


[pn:proline/betaine transporter] 
[gn:prop] 


[pn:hypothetical 25.0 kd protein in 
tyrp-leuz intergenic region] 


[pn:hypothetical 7.3 kd protein in 
tyrp-rsga intergenic region] 


[pn: regulatory protein for glycine 
cleavage pathway] [gn:gcva] 


[pn:hypothetical protein] 


[pnrhigh-affinity branched-chain 
amino acid transport permease 
protein livm] [gn:livm] 


[pn:tyrosine-specific transport 
protein] [gn:tyrp] 


[pn:high-affinity branched-chain 
amino acid transport permease 
protein livh] [gn:livh] 


[pn:high-affinity branched-chain 
amino acid transport atp-binding] 
[gn:livf] 


[pn:ferritin-like protein] [gn:ftn] 


[pn:leu/ile/val-binding protein 
precursor] [gn:livj] 


[pn:high-affinity branched-chain 
amino acid transport atp-binding 
protein livg] [gn:livg] 


[pn:glpg protein] [gn:glpg] 


[pn:protein] [gn:glpe] 


[pn:glycerol-3 -phosphate regulon 
repressor] [gn:glpr] 
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[pn:maltodextrin phosphorylase] 
[gn:malp] 


[pn:4-alpha-glucanotransferase] 
[gn:malq] 


acidic proline-rich protein precursor 
(clone prp33). 


[pn:malt] [gn:malt] 


[pn:high-affinity gluconate 
transporter] [gn:gntt] 


[pn: hypothetical protein] [gnryajo] 


[pn: hypothetical protein] [gn:ycan] 


[pn:hypothetical protein] [gn:yedu] 


hypothetical protein (argf-lacz 
region) - escherichia coli 


[pn:gals] [gn:galr] 


[pn hypothetical protein] 


[pnxhromosome segretation protein] 


[pn: hypothetical protein] [gn:ycan] 


[pn:aldo-keto reductase, putative] 


gluconolactonase precursor (ec 
3.1.1.17) (d-glucono-delta-lactone 
lactonohydrolase). 


[pn:hypothetical protein] 


[pn:fatty acid-fatty acyl responsive 
dna-binding protein] [gn:fadr] 


[pn:d-amino acid dehydrogenase] 
[gn:dada] 


[pn:alanine racemase, catabolic 
precursor] [gn:dadx] 
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[pn:glutamate/aspartate transport 
system permease protein gltj] 
[gn:gltj] 


[pn:glutamine abc transporter] 


[pn:glutamine abc transporter] 


[pn: pyruvate kinase] [gn:pykf] 


[pnxystathionine beta-synthase] 
[gnxys4] 


[pnxystathionine gamma-synthase] 
[gn:metb] 


[pn:hypothetical protein] 


or: bacteriophage phi- 105 pn:holin 
le:796 re: 11 70 di:direct 
snbacteriophage phi- 105 dna nt:orf2; 
potential dual start motif; putative 


[pn:hypothetical protein] 


[pn:atp-dependent clp protease 
proteolytic component] [gnxlpp] 


major capsid protein precursor (gp5) 
(head protein). 


[pmhypothetical protein] 


hypothetical protein - phage phi-80 


unknown„mtcy336.26 ) mtcy336.26. 
len 


portal protein (gp3). 


[pn:hypothetical protein] 


[pn:unknown sensor protein in 
terminator region] [gn:rstb] 
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[pn:hypothetical protein) [gn:ipa- 
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[pn:outer membrane protein f 
precursor] [gn:ompf] 


[pn:hypothetical protein in kdsb-kicb 
intergenic region] [gn:ycbc] 


[pn: aspartate aminotransferase] 
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[pn:hypothetical 29.8 kd protein in 
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[pn hypothetical protein] [gn:ycbk] 
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[pn:methionyl-trna synthetase] 
[gn:metg] 


[pn:hypothetical transcriptional 
regulator in molr-bglx intergenic 
region] [gn:yehv] 


[pn:d-lactate dehydrogenase] 
[gn:dld] 


phosphinothricin-resistance protein 
(ptc-resistance protein). 


[pn:hypothetical 37.7 kd protein in 
psd-amib intergenic region] 


[pn:fumarate reductase, membrane 
anchor polypeptide] [gn:frdc] 


[pn:beta-lactamase precursor] 
[gn:ampc] 


[pn:phosphatidylserine 
decarboxylase proenzyme] [gn:psd] 


[pn:fumarate reductase flavoprotein 
subunit] [gn:frda] 


[pn:fumarate reductase, membrane 
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[pn:imidazole] [gn:hisc] 


[pn:phosphoribosylformimino-5- 
aminoimidazole carboxamide 
ribotide] [gn:hisa] 


onsalmonella typhimurium le: 1 
re:>173 di:direct ntxhimeric protein 
of hisf and hisie genes (57 aa) 


hypothetical 37.6 kd protein in eld 
5'region (orf2). 


[pn.histidinol dehydrogenase] 
[gn:hisd] 


[pn:amidotransferase] [gn:hish] 


[pn:phosphoribosyl-amp 
cyclohydrolase / phosphoribosyl-atp 
pyrophosphohydrolase] [gn:hisi] 


orxaenorhabditis elegans pn:talin 
le:2 re:7663.di:direct 
srxaenorhabditis elegans (strain 
bristol) (tissue library: whol 


[pn:hypothetical protein] [gn:ybbp] 


[pn: hypothetical protein] [gn;ybbl] 


[pn:hypothetical abc transporter] 
[gn.ybba] 


onescherichia coli le: 133380 

re: 1 34066 di:direct nt:hypothetical 

protein 


[pn:hypothetical protein] [gn:ybbm] 
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[pn:hypothetical protein in cyss-fold 
intergenic region] [gn;ybci] 


[pn:phosphoribosylaminoimidazole 
carboxylase catalytic subunit] 
[gn:pure] 


[PN:3E1 protein] [DEiEntamoeba 
histolytica mRNA for 3E1 protein.] 
[LE:32] [RE:418] [DI:direct] 


[pn:hypothetical protein] [gn:ybbo] 


[pn:hypothetical protein] [gn:ybbk] 


[pn:hypothetical 7.4 kd protein in 
cyss-fold intergenic region] [gn:ybcj] 


[pn: hypothetical protein] [gn:ybbn] 


[pn:methylenetetrahydrofolate 
dehydrogenase] [gn:fo!d] 


[pn:repressor protein] [gn:mali] 


[pn:peptidyl-prolyl cis-trans 
isomerase b] [gn:ppib] 


[pn:hypothetical 26.9 kd protein in 
pure-ppib intergenic region] 


[pn:phosphoribosylaminoimidazole 
carboxylase atpase subunit] 
[gn:purk] 


[pn: hypothetical 41.1 kd protein in 
rhsd-gcl intergenic region] [gn:ybbb] 


[pn:acyl-coa thioesterase i] [gn:tesa] 


[pn:hypothetical 18.4 kd protein in 
cpsb 5 M "region] [gn:yefc] 


[pn : phosphomannomutase] 
[gn;manb] 


[pn: hypothetical protein] 
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[pn:hypothetical 36.1 kd protein in 
cpsb 5 M,, region] [gn:yefb] 


[pn:utp-glucose- 1 -phosphate 
uridylyltransferase] [gn:galf] 


[pn:gdp-mannose 4,6-dehydratase] 
[gn:yefa] 


[pn; hypothetical 44.9 kd protein in 
cpsb 5 ""region] [gn:yefd] 


[pn:mannose- 1 -phosphate 
guanylyltransferase] [gn:manc] 


[pn:hypothetical protein] [gn;wcaj] 


[pn:hypothetical protein] [gn:wcal] 


[pn hypothetical protein] [gn:wcam] 


[pn:putative 3-beta-hydroxysteroid 
dehydrogenase] 


[pn:similarity to mucin proteins, 
ykl224c, stalp] [gn:j2223] 


[pn hypothetical nadph 
oxidoreductase in fixc-kefc 
intergenic region] [gn:yabf] 


[pn: hypothetical protein] 


[pn:glutathione-regulated potassium- 
efflux system protein kefc] [gn:kefc] 


[pn:dihydrofolate reductase type i] 
[gn:fola] 


[pnxarbamoyl-phosphate synthase 
small chain] [gnxara] 


[pnxarbamoyl-phosphate synthase 
large chain] [gnxarb] 


[pn:survival protein sura precursor] 
[gn:sura] 


onhelicobacter pylori pn:orf33 
le:34041 re:34685 di:direct 


[pn:organic solvent tolerance protein 
precursor] [gn:imp] 
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[pn .hypothetical protein] 


[pn:trp repressor binding protein] 
[gn:wrba] 


[pnihypothetical 8.5 kd protein in 
agp 3""region] [gmyccj] 
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udp-rfah intergenic region] [gn;yigo] 


[pn: hypothetical 63.2 kd protein in 
udp-rfah intergenic region] 


[pn: hypothetical 29,0 kd protein in 
udp-rfah intergenic region] [gn:yigu] 


[pn hypothetical 1 1.0 kd protein in 
bisc-cspa intergenic region] 


[pn transcriptional activator] 
[gn:rfah] 


[pn:hypothetical protein] 


[pn:hypothetical protein] 
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[pn: potential acrab operon repressor] 
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[pnihypothetical protein] 


[pnihypothetical outer membrane 
usher protein in agal-mtr intergenic 
region] [gmyraj] 


[pmfimg protein precursor] 
[gmfimg] 


[pnihypothetical protein] 


[pmglutamine-binding periplasmic 
protein precursor] [gmglnh] 


[pmglutamine transport system 
permease protein glnp] [gmglnp] 


[pmtype 1 fimbrial subunit] 
[gmfima] 


[pnihypothetical 25.7 kd fimbrial 
chaperone in agai- mtr intergeni] 
[gmyrai] 


[pnihypothetical protein] 


[pnihypothetical protein] 


[pnihypothetical 9.8 kd protein in 
ding/rarb 3""region] [gmybii] 


[pnihypothetical 8.6 kd protein in 
ding/rarb 3 ""region] [gmybij] 


[pnihypothetical transcriptional 
regulator in moae-rhle intergenic 
region] [gmybih] 


[pnihypothetical protein] 


[pn : putative atp-dependent rna 
helicase] [gmrhle] 


[PN:3E1 protein] [DEiEntamoeba 
histolytica mRNA for 3E1 protein.] 
[LE:32] [RE:418] [Dlidirect] 
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[pn:hypothetical protein] 


orxscherichia coli pn: hypothetical 4- 
hydroxyphenylacetate permease 
gn:hpax le:6734 re:81 10 di:direct 


orxscherichia coli pn:regulator of 
the 4hpa-hydroxylase operon 
gn:hpaa Ie:8120 re:9007 di:direct 


4-hydroxyphenylacetate 3- 
monooxygenase (ec 1.14.13.3) 
largechain - escherichia coli (atcc 
11105) 


or:azospirillum brasilense gnxarr 
le:<l re:588 di:direct 


[pn;methyl-accepting chemotaxis 
protein i] [gn:tsr] 


homoprotocatechuate degradative 
operon repressor. 


[pn:c4-dicarboxylate transport 
protein] [gn:dcta] 


[pn:53.1 kd protein in kdgk-dcta 
intergenic region precursor] [gn:yhjj] 


[pn hypothetical 29.7 kd protein in 
tref-kdgk intergenic region] 


[pn:3-oxoacyl-acyl-carrier protein 
reductase] [gn:ylpt] 


[pn: hypothetical transcriptional 
regulator in tref- kdgk intergenic 
region] [gn:yhjb] 


[pmoligopeptidase a] [gn:prlc] 


[pn: hypothetical 13.0 kd protein in 
pit-uspa intergenic region] [gn:yhio] 


[pn: hypothetical 43.8 kd protein in 
rhsb-pit intergenic region] [gn:yhin] 


[pn: hypothetical protein in uspa-prlc 
intergenic region] [gn:yhiq] 
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[pn:hypothetical 37.9 kd protein in 
tref-kdgk intergenic region] 


[pn:hypothetical 31.9 kd protein in 
prlc-gor intergenic region] [gn:yhir] 


[pn:hypothetical protein] 


[pn:hypothetical transcriptional 
regulator in tref- kdgk intergenic 
region] [gn:yhjc] 


[pn:2-dehydro-3- 
deoxygluconokinase] [gn;kdgk] 


hypothetical 26.4k protein - 
pseudomonas aeruginosa 


[pn:araj protein precursor] [gn:araj] 


[pn;hypothetical transcriptional 
regulator in tref- kdgk intergenic 
region] [gn:yhjc] 


[pn:osmotically inducible protein c] 
[gn:osmc] 


[pn.hypothetical protein] 


skin secretory protein xp2 precursor 
(apeg protein). 


[pn:hypothetical protein] 


[pn:hypothetical protein] 


[pn:hypothetical 15.6 kd protein in 
pura-vacb intergenic region] 
[gn:yjeb] 


[pn:hypothetical protein] 


[pn:hypothetical protein] 
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regulator in tref- kdgk intergenic 
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[pn:nad-linked malic enzyme] 
[gntsfca] 


[pn:hypothetical protein] 


[pn:hypothetical protein] 


[pn.hypothetical 32.0 kd protein in 
pyka-zwf intergenic region] 
[gn:yebk] 


[pn: hypothetical protein] 


[pn.hypothetical protein] 


[pn:hypothetical protein] 


[pn: hypothetical protein] 


[pn:hypothetical protein] 


[pn: hypothetical protein] 


dehydrogenase,, mtcy 1 6f9.02,mtcy 1 6 
19.02, probable dehydrogenase, len 


[pn:excision nuclease abc subunit b] 
[gn:uvrb] 


[pn:purine ntpase homolog] 


hypothetical 29.3 kd protein in 
region 2 of sym plasmid (no 1265). 


[de:yersinia pestis plasmid ppcpl, 
complete plasmid sequence.] 
[pn:transposase] 


vagc protein - salmonella dublin 
virulence plasmid 


hypothetical 27.4 kd protein in hyrl 
3'region. 


[pmflagellar transcriptional activator 
flhd] [gn:flhd] 


On 


ydiF 


so 
OO 

-3- 

lo 


CO 

to 
oo 


oo 
oo 
"d- 


bl487 


oo 
2} 


yobT 


oo 
x> 


oo 
r- 


Z96073 


b0779 


H69378 


P50360 


AF053945 


S22685 


P40586 


b!892 


Escherichia 
coli 


Bacillus 
subtilis 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Bacillus 
subtilis 


Escherichia 
coli 


Escherichia 
coli 


Mycobacteriu 
m tuberculosis 


Escherichia 
coli 


Archaeoglobus 
fulgidus 


Rhizobium sp. 


Yersinia pestis 


Salmonella 
dublin 


Saccharomyce 
s cerevisiae 


Escherichia 
coli 


3.5(10)-274 


4.9(1 0)-52 


2.2(10)-151 


l.l(10)-5 


4.2(1 0)-65 


1.5(10)-234 


1.8(10)- 126 


0.03699 


1.7(10)-27 


6.7(10)-138 


4.4(1 0)-54 


1.5(10)-181 


6.7(1 0)-5 


2.5(10)-25 


1.7(10)-162 


6.0(10)-25 


2.2(1 0)-36 


© 

1 

SO 
CM* 


2635 


to 

wo 


1476 


so 
CN 


CN 
so 
so 


2261 


1241 


ti- 
es 


o 
ro 


1349 


oo 
v-> 
wo 


1761 


CO 

CO 


CO 
On 
CN 


1581 


CO 
OO 
CN 


On 

CO 


to 


w> 
On 
to 


CN 

to 


o 

•O 

CO 


ON 
OO 
CN 


wo 
On 


so 
to 


oo 

CO 


o 

CO 


ON 
OO 


? 

ro 


r- 
o 


o 
r-» 


CO 

o 

On 


r- 


o 
wo 
ro 


OO 


wo 

CM 


Os 
OO 


1785 


1716 


1050 


r- 

SO 
OO 


wo 
oo 
wo 


1548 


1044 


o 

CO 
On 


r-» 
so 

CN 


1023 


1221 


2112 


2709 




1050 


ro- 
CM 


CM 
SO 

r- 


r- 

SO 
CM 


8027 


8028 


8029 


8030 


8031 


8032 


8033 


8034 


8035 


8036 


8037 


8038 


8039 


8040 


8041 


8042 


8043 


8044 


2365 


2366 


2367 


2368 


2369 


2370 


2371 


2372 


2373 


2374 


2375 


2376 


2377 


2378 


2379 | 


2380 


2381 


2382 


to 

o' 

SO 

to 

On 


SO 

II 1 

CO 

co 

CO 

co 


22010303_c2_128 


3583646 l_c3_l 47 


16692842_c3J48 


12386275_c3_149 


32656630_c3_153 


so 

^1 

CO 

o 
oo 

CO 
I/O 
OS 
WO 


CO 
so 

~l 

CO 

"l 

CN 
O 
WO 


224642 12_c3_l 67 


oo 
so 

1 

CO 

vc' 

ON 

wo 

CN 

SO 
co 


22132932J3J1 


wo 
r- 

c 1 

ro 
OO 

wo 

CO 
CO 

wo 
co 


o 

!!• 

<_> 

Os" 
CN 
CM 

r- 


2047880_cl_81 \ 


52865 16_cl_88 


2932082_cl_90 


5 

r- 
r- 
to 

SO 
CM 


CONTIG452 


CONTIG452 


CONTIG452 


CONTIG452 


CONT1G452 


CONTIG452 


CONTIG452 


CONTIG452 


CONTIG452 


CONTIG452 


CONTIG452 


CONTIG453 


CONTIG453 


CONTIG453 | 


CONT1G453 


CONTIG453 


CONTIG453 


CONTIG453 



237 



[pn:alcohol--acetaldehyde 
dehydrogenase] [gn:adhc] 


[pn:flagellar hook flge] [gn:flge] 


[pnihypothetical protein] 


[pn:excinuclease abc] 


[pn: hypothetical protein] 


[de:yersinia pestis plasmid ppcpl, 
complete plasmid sequence.] 
[pn:transposase] 


vagd protein - salmonella dublin 
virulence plasmid 


[pnihypothetical protein] 


[pnihypothetical protein] [gn:yaim] 


[pn:rod shape-determining protein 
mred] [gn:mred] 


[pnihypothetical 21.5 kd protein in 
cafa-mred intergenic region] 
[gn:yhde] 


[pnitldd protein] [gnrtldd] 


[pnihypothetical protein] [gn:yhcr] 


[pnihypothetical 73.6 kd protein in 
argr-cafa intergenic region] 
[gn:yhcp] 


[pn:malate dehydrogenase] [gn;mdh] 


[pn:rod shape-determining protein 
mrec] [gn:mrec] 


[pnicytoplasmic axial filament 
protein] [gn:cafa] 


[pnihypothetical 107.7 kd protein in 
argr-cafa intergenic region] 


so 
wo 
co 
o 

JD 


HP0870 


HI0184 


uvrA 


wo 
ro 

O - 

JD 


AF053945 


S22686 


r- 

wo 
co 
o 

JD 


wo 
wo 

co 

o 

JD 


b3249 


b3248 


b3244 


CN 

T* 

<N 
CO 
JD 


o 

Tl- 

CN 
CO 
JD 


SO 
CO 
CN 
CO 
JD 


o 
wo 

CN 

CO 

JD 


b3247 


wo 

Tl- 

CN 
CO 
JD 


Escherichia 
coli 


Helicobacter 
pylori 


Haemophilus 
influenzae 


Bacillus 
subtilis 


Escherichia 
coli 


Yersinia pestis 


Salmonella 
dublin 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


1.8(10)- 167 


wo 
o 


so 

O 
CO 


no 
SO 

O 

Tf 

WO 


oo 
co 

O 

ro 


oo 

CN 

o 
oo 


co 

CN 
■ 

O 


2.3(1 0)-21 


SO 
■ 

o 

wo 
CN 


2.1(10)-72 


so 

oo 

1 

o 
r- 


CN 
CN 
■ 

O 

CO 
CO 


Tl- 

co 
1 

o 
oo 

ON 


r— 

CO 

CN 
O 

CO 
CO 


6.7(10)- 145 


CO 

CN 
O 

Tj- 

r-i 


so 

CN 
CN 

O 

SO 
CO 


o 


1628 


ON 


co 
o 

CN 


1136 


CN 

Tt 


1260 


CN 


ON 
CN 


CO 

oo 

SO 


co 
r- 


Tt 

SO 
OO 


2135 


SO 
SO 

CO 


2286 


1415 


1207 


2182 


3981 


co 


Tt 

WO 


Tj- 

CN 


wo 
oo 
oo 


co 

ON 


CO 
so 
CN 


co 


SO 

ON 


r- 

o 

CN 


<N 


o 

CN 


CN 

oo 

Tl- 


r— 


SO 


CN 
CO 


o 

Tt 

co 


CN 
O 
wo 


1283 


ro 


1362 


CN 

r- 

HO 


2655 


ON 

r- 

CN 


TT 

O 
OO 


ON 
CN 


OO 

oo 

CN 


CN 

SO 


co 
co 

SO 


CO 

o 

SO 


1446 


co 
CN 


2031 


CO 
so 
On 


1020 


1506 


3849 


8045 


8046 


8047 


8048 


8049 


8050 


8051 


8052 


8053 


8054 


8055 


8056 


8057 


8058 


8059 


8060 


8061 


8062 


2383 


2384 


2385 


2386 


2387 


2388 


2389 


2390 


2391 


2392 


2393 


2394 


2395 


2396 


2397 


2398 


2399 


2400 


1074401 l_cl_95 


33644826_cl_97 


oo 

«*. 

2' 

o 

<N 

wo 

WO 
CN 
SO 
CN 


o 
co 

~" 1 

CN 

£ 

O 
wo 

SO 

o 

ro 
tJ- 
<N 


33697 188_c2J 32 


22672286_c3_135 


2402538 l_c3_144 


r- 

Tt 
— I 

ro 

co 

CO 

r- 

<N 

r- 
o 


19767826_c3_149 


1 

r- 
wo 

ON 
CO 
CN 
OO 

wo 


.j 

CO 
OO 
Tl" 
OO 
ON 
CO 


24627202_fl_10 


10l79131_fl_ll 


36042040_fl_14 


23614003_flJ7 


co 

CJ 

WO 

r- 
oo 
o 

CO 


14453433_f2_35 


16026457_f2_36 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG453 


CONTIG454 


CONT1G454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 



238 



[pn:hypothetical 34.8 kd protein in 
argr-cafa intergenic region] 
[gn:yhcq] 


[pn:hypothetical 73.3 kd protein in 
mreb-accb intergenic region] 
[gn:yhda] 


[pn:rod shape-determining protein 
mreb] [gn:mreb] 


[pn:succinate-semialdehyde 
dehydrogenase] [gn:gabd] 


[pn: hypothetical protein] [gn:yhco] 


[pn:protease precursor] [gn:degs] 


[pn:hypothetical protein] 


[pn:hypothetical protein] 


[pn:protease precursor] [gn:degq] 


[pn:arginine repressor] [gn:argr] 


[pn: hypothetical 34.7 kd protein in 
mreb-accb intergenic region] 
[gn:yhdh] 


[pn hypothetical 15.2 kd protein in 
rplm-hhoa intergenic region] 
[gn:yhcb] 


[pn:arginine repressor] [gn:argr] 


[pn: hypothetical protein] [gn:yhcn] 


or:azospirillum brasilense gnxarr 
le:59 re:580 di:direct nt:orf2 


[pn hypothetical protein] [gn:yhcs] 


[pn:hypothetical 65.0 kd protein in 
hupb-cof intergenic region] 


CN 

m 

x> 


CN 
m 

m 


CN 

m 
XI 


b2661 


On 
ro 
CN 

m 
X> 


m 
r-> 
CN 

m 
X) 


bl97l 


CN 

o\ 

X) 


b3234 


b3237 


m 
m 

CN 

m 
X) 


m 
m 
CN 
m 
X 


b3237 


oo 
m 

CN 
CO 
XI 


X70360 


b3243 j 


m 
o 

XJ 


Escherichia 
coli 


Escherichia 
coli 


Escherichia . 
coli 


Escherichia. . 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Azospirillum 
brasilense 


Escherichia 
coli 


Escherichia 
coli 


6.7(10)- 145 


8.8(10)-278 


1.5(10)-188 . 


• 

o 

o< 


5.9(10)-27 


3.1(10)-133 


2.6(10)- 145 


l.l(10)-27 


7.9(10)- 167 


4.2(10)-8 


1.3(10)-130 


3.6(10)-52 


6.4(l0)-62 


1.3(10)-13 


CN 

■ 

o 

i"rT 
CN 


1.5(10)-147 


3.1(10)-220 


1415 


2669 


1827 


1376 


CN 
O 
c-> 


1305 


1419 


Os 
O 
c~> 


1622 


CN 


1280 


o 
m 


CN 

m 
vO 


VO 


VO 


1440 


2126 


o 

CN 

m 


vO 
VO 


o 

OS 

m 


oo 

V~l 
IT) 


m 
O 


m 
r- 
m 


'3- 

m 

rO 


OO 


m 
r- 


m 


oo 

CN 

m 


oo 
m 


VO 


o 

Os 


Os 


o 

m 


m 


o 

vo 

ON 


1983 


1170 


1674 


in 

CO 


1125 


1062 


m 

CN 


1425 


Os 
CN 


Ti- 
oo 

Os 


^1* 
r- 


oo 
m 
tj- 


o 

CN 


r- 
m 
m 


o 
m 
Os 


1731 


8063 


8064 


8065 


9908 


8067 


8068 


8069 


8070 


8071 


8072 


8073 


8074 


8075 


8076 


8077 


8078 


8079 


2401 


2402 


2403 


2404 


2405 


2406 


2407 


2408 


Os 
O 

CN 


2410 


2411 


2412 


2413 


2414 


2415 


2416 


2417 


30208260_f2_41 


m 
m 

c' 

o 1 

ti- 
m 
r- 
^- 

vo 

V~i 


3223031 l_f3_56 


o 

1 

a 

i 

m 

CO 

oo 

CN 
in 


22917932J3J71 


23650765_cl_77 


16265886_cl_110 


14455203_cl_lll 


6070136_c2_1 12 


3424400 l_c2_l 16 


vO 
CN 

°i 

OO 
VO 
m 
^* 


125853 13_c3_l 49 


24220842_c3_150 


24644052_c3_151 


13683312_c3_l55 


32244052_c3_161 


23642302 Jl_27 


CONTIG454 


CONTIG454 


CONTIG454 


CONT1G454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONTIG454 


CONT1G454 


CONTIG455 




239 



JO 
C 



© 



s > 
s. i 



cd O ^ 



c 



o 

Cu 



o 
cu 



o 
a. 



Cu 1> 



I 

is- 

i i 

O CD 
b« <— 1 
CU , 1 

— c 

oS O 

£ '5b 
« H 

li 



a> 
o 
a. 



"3- 

o 



o 



O XI 



Q- S 
O C 

X .S 



in o 



m 



On 



to 

o 



o 
cj 



o c 
X .S 



c2 ^ 



CI 



o 



a 



a 

H 
O 

cj 



a 



240 



[pn: hypothetical protein] [gn:ybav] 


[pn hypothetical protein] 


[pn:cof protein] [gnxof] 


[pn:nitrogen regulatory protein p-ii] 
[gn:glnk] 


[pn: hypothetical 41.9 kd protein in 
fucr-gcva intergenic region] 
[gn:ygde] 


[pn:hypothetical protein] 


[pn:hypothetical protein] [gn:ygcy] 


[pn:hypothetical protein] [gn:ygcx] 


[pnxtp synthase] [gn:pyrg] 


[pn:pts system, maltose and glucose- 
specific ii abc component] [gn:malx] 


[pn:hypothetical protein] 


[pn:hypothetical rna 
methyltransferase in rela-bara 
intergenic region] [gn:ygca] 


[pn:gtp pyrophosphokinase] [gn:rela] 


[pn:lase] [gnxno] 


[pn: regulatory protein for glycine 
cleavage pathway] [gn:gcva] 


[pn:hypothetical 14.3 kd protein in 
fucr-gcva intergenic region] 
[gn:ygdd] 


[pn:syd] [gn:syd] 


[pn:hypothetical protein] [gn:yqcb] 


CN 

^i- 

o 
x> 


HI1419 


b0446 


o 
to 
^1- 

o 

X) 


b2806 


CN 

ON 

CN 

X) 


OO 
OO 

r- 
CM 
X) 


b2787 


b2780 


CM 

so 

X) 


On 
OO 

r- 

CM 

X) 


to 

OO 

r- 

CN 
X) 


CO 

r-- 

CM 

X) 


b2779 


OO 

o 

OO 
CM 

X) 


r- 
o 

OO 
CM 
XJ 


ro 

ON 

r- 

CM 
X) 


On 

r- 

CM 
X) 


Escherichia 
coli 


Haemophilus 
influenzae 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


7.0(1 0)-31 


1.0(10)-8 


4.7(10)-! 12 


3.6(1 0)-52 


r- 
oo 

• 

o 
to 


>o 
ro 

O 

to 

CN 


CM 
CM 
CN 
i 

O 
ro 


On 

CN 
i 

O 
VO 


4.0(1 0)-266 - 


2.5(1 0)-74 


4.0(1 0)-225 


r- 

NO 

o 
o 

ro 


o 


2.7(1 0)-205 


to 
to 

o 


6.5(1 0)-60 


6.7(1 0)-67 


2.7(10)- 116 


On 
co 
ro 


o 

ro 


1105 


o 
*o 


1812 


oo 
ro 


2148 


2115 


2559 


On 


2172 


1626 


3496 


1985 


1509 


ro 
SO 


On 

r- 
so 


1 145 


so 


CN 


OO 
CN 


to 

CN 


ro 
r- 
ro 


to 


VO 


o 
to 


to 
to 


o 
to 


ro 

SO 


CM 
CM 

*o 


o 

SO 
r- 


r- 
•o 


CM 
ro 


to 


CM 
O 
CN 


so 
CM 


o 
to 


OO 
ro 


ro 
oo 


to 

r- 
ro 


1119 


CN 
NO 

^i- 


ro 
OO 
ro 


o 
to 
ro 


ro 
to 
so 


o 
to 
ro 


On 
OO 
ro 


NO 

SO 
»o 


2280 


1371 


so 
ro 
On 


to 
ro 


SO 

o 

SO 


CN 
On 

r-- 


8099 


8100 


8101 


8102 


8103 


8104 


8105 


8106 


8107 


8108 


8109 


8110 


OO 


8112 


8113 


8114 


8115 


8116 


2437 


2438 


2439 


2440 


2441 


2442 


2443 


2444 


2445 


2446 


2447 


2448 


2449 


2450 


2451 


2452 


2453 


2454 


24344391_c3J66 


342572 12_c3_l 68 


o 
r- 

~l 
ro 

NO 

o 

CN 
O 


88645 l_c3_l 77 


1 7069787 Jl J 


14339808_fl_12 


24726386_fl_18 


14704837_fl_19 


1 44075 1J1J9 


l£ U 000096 


ON 
^1 

cs 

CN 
OO 
CN 

o 

OO 
CM 


so 
CN 
OO 

CO 

to 
ro 


CM 

ro 
to 

CM 
SO 
OO 
OO 

to 


16208537_f2_66 


2214701 l_f3_71 


15868766_f3_72 


33750181_r3_83 


14880066_f3_85 


CONTIG455 


CONTIG455 


CONT1G455 


CONTIG455 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 


CONTIG456 




241 



C 

13 W) 



■§ * ■ ■ 

8,-g "3 



o 

OJO 

e 



DO 



er] 




t: 




o 




Cl 
















C 




'C 

<u 












> 






¥ 






O. 


"a 


c 


c 


.Cl 


^2 



OX) 



E 
o 
a. 



2 > a 

■S I, ^ 

1 1= 1 

O ~ 

E ■ 

o ' 



O T3 co il 



'1 s 



a. , 



o 

Cl 



1 | e 



o 
as 



E 



E 
o 



on o 



u in 

« O 



O — 

« o 



CM 
CM 



O 
CM 



o 



o 
oo 
© 
r- 
oo 

CM 
CM 



a 



o 



a 



o 
u 



o 




242 



£ 

Cl 



^ a. 



o 
_c 
a. 

en 
O 

Cl. 



| 

T3 O 

"5b 

- c 

o op 

'5 B 

£ £? 

O ed 

c c a. 

,00 O. h 



o , 

a. c 

c 2 

cd "ob 
is «L> 



cd C 

, <L> 

cd oo 



>> Cl. 



-o 



c/i cd 

tn — • 

o >^ 

«r X 

w Is 

on g 

B « 



5 a 

00 _g 

■a' "6 5 

- u DQ 

■S a e 

2 S < 

" -I 



o 
£ 

o 

C 



e?.g- 

x « 
" f- 

IS. 

Lu Cu ■ 

I 



i-f 

S "So 



5? 



o 



1 




a> 




*S 




c 












B 




Id 




a 








c 




u 




JO 




O 




c 




"s 




ed 




ed 
t- 


ba 


ed 


cd 


O. 


Cl 


C 


C 




J3fi 



.60 



■s § 



o 

Cl 



C 

o 



o 



Du- 



ll} 3 



O 

r» 
o 



ON 



r-- 

Ov 



"3- 

o 



a 

O 



r- 
o 



O 



« o 



to o 



O 



oo 



oo 



a 




243 



O 



CD 

o 

£ 



£ .2 

IS ^ 
e tii 
Q 



1 S . 

CO o 

0 -o 

1 E _ 

< % 

1 e . 



R S 



S .5 ^ E u 

^ o 22 c 

fT u. C <L> 

< & a a oo 



DO £ 

-o C 

s 2 



1 S>§ 



oj O 



Ji £ 5 

O ' — 1 



T3 

E _ 



* 5 7? Lu a. Tt 



u -a 



W ■§ 



^ a g 



a. 

1 I 



■o 



-2 TD 

° E 

I &l 

II s . 
i * * 

■She 
3 §< 



O (L> 

do >; 
o£ o 
-a j= 
o 

<3-> 

1 £ 



2 5 

C " 

is 
is 



K "So 



* « -i -S 



o. 

1 i 



t: 
a. 

a i 



.2 -a 

° E 
o 

■I s»i 
.II s . 

•g «> E 

3 S< 

W •§ -a 



d> 
C 

!•§ s 

1 8 = 

1 2 I 

^ , — , -o 

c o o 
tag 

* i t 

£ g ^ 



o 



CiD <D 

e o 

.1 6 

DO § 

= r 



s i 



o 
E 



^ I 

JO 3 

CD 0> 

5 g. 



o 



O 



244 



° a 



o 



4> 



O 

s 

3 
O 

c 

_ a. 

O J5 



•a ^ 
£ Q 



<u c ^ — J=r-> 



o -o 

"I B 

8 CQ 

<u o 

•o -o 

« £ 

s < 

•i ^ 



a "O 
£ 2 

■g I 



E 

, i "13 

Q '« 

£ Q 



.5 -a 
° £ 

g Q 
II 

■8 £ 



<L> O o> 

C w « 

o> c3 

* - 

C£ o 
o 

S 



■e e 



« 



3f 

1 £ 



n en cq pc, 

E S J) 



CU 

E 
o 



"i e § 

n »o £ 

2*3 E 

<L> i- 



'5 8 

P 2 



o3 £ 
sz o 



=1 

•o ^ &o 



o .2 



*2 *s 

O OD 



o 
o r^ 1 

CM C 

1 — 1 O. 

77 E 

52 o 
g 9. 

£° 

II 



-5 o 

c ° " 

o3 <n c 

en — o 

^di E 

w ^ E 



O ^ 22 



as 



*>, u 



.2 o 

p. ^ 



O 

c 

, , o 

o 2 

^? e <u cd 
O c 



a > 



o 

1 e . 

- a 

CQ o 

1 E . 

1 S . 

J E 

° a 

g "g 

so c 



"S 5 

t> i—i 

2- 

E ^ 

o "P. 
° uu 

3" ^ 
n , — i 
Si* 00 
v oo 

-8 w 



OS 



o 



X 



^ o 

"co E 



s i 

JO 3 

3 g. 



E 



.3 ^ 



5: b 
U3 o 



O 
On 



a 

B 



245 
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[pn: hypothetical 15.1 kd protein in 
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[pn: hypothetical protein in heml-pfs 
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[pn: hypothetical 29.4 kd protein in 
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[pn: hypothetical protein in heml-pfs 
intergenic region] [gn:yadq] 


[pn:deoxyguanosinetriphosphate 
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kinase] [gn:pyrh] 
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[pn hypothetical protein in cdsa 
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[pn; phosphotransferase] [gn:frwd] 


[pn:n-acetyl-gamma-glutamyl- 
phosphate reductase] [gn:argc] 


[pn:acetylglutamate kinase] 
[gn:argb] 


[pn:argininosuccinate lyase] 
[gn:argh] 


[pn hypothetical 26.6 kd protein in 
udha-trma intergenic region] 


[pn:hypothetical 13.0 kd protein in 
udha-trma intergenic region] 


[pnxystathionine gamma-synthase] 
[gn:metb] 


[pn:5,10 methylenetetrahydrofolate 
reductase] [gn:metf] 


[pnxatalase hydroperoxidase i] 
[gn:katg] 


[pn phosphotransferase] [gn:frwc] 


[pn:probable pyruvate formate- lyase 
2 activating enzyme] [gn:pflc] 


[pn: vitamin b 1 2 receptor precursor] 
[gn:btub] 


[pn:hypothetical protein] 


[pn:2-keto-3-deoxy gluconate 
permease] [gn:kdgt] 


[pn:protein] [gn:phnb] 


[pnxarbon phosphorus lyase] 
[gn:phnd] 


[pn:phnh protein] [gn:phnh] 


[pn:phnj protein] [gn:phnj] 


[pn:phosphonates transport atp- 
binding protein phnn] [gn:phnn] 
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[pn:ribose abc transporter] 


[pn:fructose-l ,6-bisphosphate 
aldolase] [gnitsr] 


[pn:hypothetical 25.1 kd protein in 
gltp-fdhf intergenic region] [gn:yjco] 


[pn hypothetical 25.1 kd protein in 
gltp-fdhf intergenic region] [gn:yjco] 


[pn:phna protein] [gnrphna] 


[pn:phnf protein] [gn:phnf] 


[pn:phng protein] [gn:phng] 


[pn:phnm protein] [gn:phnm] 


[pn:phno protein] [gn:phno] 


[pn:phnp protein] [gn:phnp] 


[pn:ribose abc transporter] 


[pn;ribose abc transporter] 


[pn:phosphonates transport atp- 
binding protein] [gn:phnc] 


[pn:phne] 


[pn:phni protein] [gn:phni] 


[pn:phosphonates transport atp- 
binding protein phnk] [gn:phnk] 


[pn:phosphonates transport atp- 
binding protein phnl] [gn:phnl] 


[pn:hypothetical protein] 


rbsC 


fbaA 


b4078 


oo 
r- 
o 


oo 
o 


b4102 


b4101 


»o 

Os 

<=> 
X) 


b4093 


b4092 


rbsA 


rbsB 


b4106 


b4104 


Os 

OS 

o 
-O 


OS 

o 
X) 


b4096 


ydjE 


Bacillus 
subtilis 


Bacillus 
subtilis 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Bacillus 
subtilis 


Bacillus 
subtilis 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Bacillus 
subtilis 


7,9(1 0)-48 


6.2(1 0)-55 


oo 
ro 
• 

o 

ro 


8.0(1 0)-23 


to 
to 

O 
co 


o 

VO 
CM* 


o 
to 

1 

oo 

CM 


VO 


oo 
to 

O 

to 

CM* 


8.0(10)-1 10 


■ 

o 

ro 


3.8(1 0)-30 


3.5(10)-! 16 


OS 

to 

1 

o 
r— 

CM 


o 

VO 
1 

<o 

ro 


CO 
1 

o 
vq 


o 
o 

ro 


2.0(1 0)-8 


OS 

OS 


VO 
VO 
to 


CM 


ro 

vo 

CM 


oo 

vo 

to 


1098 


CM 
CM 

to 


1630 


OO 
Os 
to 


1084 


1154 


CM 
ro 
ro 


1144 


r- 
o 

vO 


1563 


1119 


1003 


CM 

to 


o 

VO 

co 


oo 
oo 

CM 


VO 


ro 

vo 


vo 


CO 

to 

CM 


to 


CM 
CM 


oo 
to 

CM 


VO 
to 

CM 


VO 
to 


VO 
ro 
ro 


CM 

r- 

CM 


CM 

OO 
CM 


<=> 

OS 

ro 


o 
oo 

CM 


o 

ro 
CM 


VO 
CO 


1080 ; 


vo 
oo 


CM 
OS 

"3- 


OS 
OO 


CM 
Os 


OS 

»o 

r- 


co 
to 


1266 


r- 


OO 
VD 

r- 


1548 


1008 


VO 
OO 


VO 

oo 


1170 


OO 


O 
OS 

VO 


1083 


8822 


8823 


8824 


8825 


8826 


8827 


8828 


8829 


8830 


8831 


8832 


8833 


8834 


8835 


8836 


8837 


8838 


8839 


3160 


3161 


3162 


3163 


3164 


3165 


3166 


3167 


3168 


3169 


3170 


3171 


3172 


3173 


3174 


3175 


3176 


3177 


17054132J1J7 


884702J1J9 


to 

-I 

oo 

co 

VO 
CM 


32612590 J1J6 


»o 
Ci' 

OO 

to 
r-~ 
oo 

Os 

o 


vO ( 

C! ( 

ro 
oo 

«o 

CM 
CM 


CM 

G , 

r- 

CM 
OO 

to 
oo 
r- 
to 

CM 


Ci 

CM 

VO 

to 

to 

oo 
ro 
co 


o ■ 
r- 

a 

i 

oo 
to 

CM 

r- 

CM 
Os 
CM 


3958592_f2_7I 


to 

a 

i 

CM 

VO 
VO 


r-» 

^1 
Ci 

i 

o 

OO 

to 

VO 
VO 
OO 
ro 


19567043_O_99 


67679 17_f3_l 02 


901 0 80019991 1 


35283517_f3_108 


4488588_f3_109 


ro 
CM 

c ! 

r- 

VO 

«o 
oo 
o 
t-- 

OS 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 


CONTIG482 



285 



£ 



1 | 
f I 



-a 



4> <D 



w) Cu 



CO 

o 



£1 u .> 

o ™ ^ 

N T3 ^ 

e <£ g. 



o 



Cu 

Cu O 



c c 

uB -a 



Cu 

a 

O 



— o 

t> 

S .2 

o « 

£■§ 



o o 



Cu 
CM 



Cu jo^ 

e J 



JC T3 

"O Jo 



•? g. 



o 
J3 

Cu 



— 3 w 



c 



e 

Cu 



f S 



Cu 3 



2 
Cu 



o 



S3 



OQ 



•2 § 

O £ 

_C GO 

«! « £ 



O 1= 

cu 5 



on o 



VD 
O 



2: 
o 
u 



oo 

a 

o 
u 



286 



[pn:sn-glycerol-3-phosphate 
transport system permease protein] 


[pn:sn-glycerol-3-phosphate 
transport atp-binding protein] 


[pn:gtnukr operon regulator] 
[gn:gntr] 


[pn:thermoresistant glucokinase] 
[gn:gntk] 


[pn:gntu_l] 


[pn:glycogen operon protein glgx] 
[gn:glgx] 


[pn:leucine-specific binding protein 
precursor] [gn:livk] 


[pn:high-affinity branched-chain 
amino acid transport permease 
protein livm] [gn:livm] 


[pn:high-affinity branched-chain 
amino acid transport atp-binding] 
[gn:livf] 


[pn:l,4-alpha-glucan branching 
enzyme] [gn:glgb] 


[pn:glycogen synthase] [gn:glga] 


[pn:alpha-glucan phosphorylase] 
[gn:glgp] 


[pn:high-affinity branched-chain 
amino acid transport permease 
protein livh] [gn:livh] 


[pn:sn-glycerol-3-phosphate 
transport system permease protein] 


[pn:glycerophosphoryl diester 
phosphodiesterase] [gn:ugpq] 


[pn:gamma-glutamyltranspeptidase] 
[gn:ggt] 


[pn hypothetical 38.8 kd protein in 
gntr-ggt intergenic region] [gn:yhhx] 


[pn:hypothetical 26.3 kd protein in 
gntr-ggt intergenic region] 
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[pn:endonuclease viii, dna n- 
glycosylase with an ap lyase activity] 
[gn:nei] 


[pn:succinate dehydrogenase 
flavoprotein subunit] [gn:sdha] 


[pn:succinate dehydrogenase iron- 
sulfur protein] [gn:sdhb] 


[pn:e2] [gn:sucb] 


[pn:succinyl-coa synthetase alpha 
chain] [gn:sucd] 


[pn:ybgb] [gn:ybgg] 


[pn:hypothetical protein] [gn:ybgi] 


[pn:succinate dehydrogenase 13 kd 
hydrophobic protein] [gn:sdhd] 


[pn:2-oxoglutarate dehydrogenase el 
component] [gn:suca] 


[pn:succinyl-coa synthetase beta 
chain] [gn:succ] 


[pn:heat-responsive regulatory 
protein] [gn:hrsa] 


[pn:potassium-transporting atpase, a 
chain] [gn:kdpa] 


[pn:potassium-transporting atpase, b 
chain] [gn:kdpb] 


[pn:potassium-transporting atpase, c 
chain] [gn:kdpc] 


[pn:sensor protein kdpd] [gn:kdpd] 


[pn:hypothetical protein] 


[pn:ornithine decarboxylase, 
inducible] [gn:spef] 


[pn:fatty acyl responsive regulator] 
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[pn:pyrophosphate phospho] 
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[pnxysq protein] [gnxysq] 


[pn:hypothetical 64.8 kd protein in 
msra-chpbi intergenic region] 
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[pnihypothetical 2 1 .0 kd protein in 
bioh-gntt intergenic region] 


[pnihypothetical 43.2 kd protein in 
ppia-nirb intergenic region] 
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apoprotein] [gmnirb] 
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[pn:hypothetical protein] 


[pn:hypothetical protein] 


[pn: urease] 


[pn:30s ribosomal subunit protein 
s21] [gn:rpsu] 


[pn:rna polymerase sigma-70 factor] 
[gnxpod] 


[pnxipb protein] [gnxlpb] 


[pn hypothetical 24.9 kd protein in 
tolc-ribb/htrp intergenic region] 
[gn:ygib] 


[pn: hypothetical protein] 


[pn hypothetical protein] 


urease operon ured protein. 


[pn:urease beta subunit] [gn:ureb] 


urease accessory protein uref. 


[pn: urease accessory protein] 
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[pn:dnaprimase] [gn:dnag] 


[pn hypothetical protein] 


[pnhypothetical 26.5 kd protein in 
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[pn hypothetical fimbrial-like protein 
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[pn:3,4-dihydroxy-2-butanone 4- 
phosphate synthase] [gn:ribb] 


[pn:hypothetical 29.9 kd protein in 
tolc-ribb intergenic region] [gn:ygid] 


[pn:hypothetical protein] 


[PN: hypothetical protein] 
[OR:Synechocystis sp.] [SR:PCC 
6803, , PCC 6803] [SR:PCC 6803, ] 


[PN hypothetical protein] 
[OR.Synechocystis sp.] [SR:PCC 
6803, , PCC 6803] [SR:PCC 6803, ] 


[pn hypothetical protein in marr 
5""region] [gn:ydeb] 


or:synechococcus pcc7942 
pn:unknown le:4337 re:>4953 
di:direct nt:orf205 


[pn: hypothetical protein] 


[pn: hypothetical protein] [gn:yneh] 


[pn; hypothetical 41.5 kd protein in 
psif-proc intergenic region] [gn:yaic] 
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[pn hypothetical protein] 
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[pn: hypothetical protein] 


[pn: hypothetical protein] 


[de:mycobacterium tuberculosis 
sequence v047.] [pn hypothetical 
protein mtv047.09c] 
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[pn hypothetical protein] [gn:nl5nr] 


b3041 


b3039 


ydeE 


S76228 


S77535 


ON 
CM 
»/0 

X) 


U59236 


wo 
CM 
»vO 

X3 


CM 
WO 


oo 
co 

o 


CM 
X) 


o 

CM 

«o 

X> 


ON 

ro 

X) 


ykuV 


CO 

x> 


AL022002 


yxjG 


Escherichia 
coli 


Escherichia 
coli 


Bacillus 
subtilis 


Synechocystis 
sp. 


Synechocystis 
sp. 


Escherichia 
coli 


Synechococcu 
s PCC7942 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Escherichia 
coli 


Bacillus 
subtilis 


Escherichia 
coli 


Mycobacteriu 
m tuberculosis 


Bacillus 
subtilis 


o 
o 

1 

o 

CM 
CM 


2.7(10)- 125 


5.7(10)-! 1 


ON 
i 

O 

o 


oo 
vo 
• 

O 
oo 


CM 

r- 
■ 

o 

CM 
ro 


\o 
rM 

o 

\6 


ON 
NO 

o 
oo 

CM 


9.8(10)- 137 


oo 
o 


9.9(1 0)-231 


On 
CM 

O 

vO 
CM 


r- 

o 
p 


1.1(10)-16 


o 


vO 
i 

o 
r- 


ro 

ON 

i 

O 
</~i 


uo 
on 

ON 


1230 


o 


I/O 

ON 


CM 
CM 

ro 


ON 
CM 


CM 
ON 

CM 


1645 


1338 


UO 
CM 
CM 


2225 


1268 


XO 
ro 

CM 


o 

CM 


3043 


NO 

o 


NO 
CM 
ON 


III 


VO 

r- 

CM 


CM 
O 

ro 


CM 
vO 
^" 


^1- 

NO 

ro 


ill 


co 
oo 

CM 


NO 

r- 
^ 


NO 

ro 


ON 
CM 
CO 


oo 
oo 


CM 

co 


CM 
Cl 


CM 


oo 
no 

NO 


o 


NO 
CO 


VO 
vO 
NO 


*r> 

CM 

oo 


NO 

o 

ON 


1386 


1092 


oo 

NO 


ON 

oo 


1428 


oo 

ON 


r- 
oo 

ON 


1464 


co 

NO 
ON 


1296 


1326 


1974 


o 

CM 


1128 


9084 


9085 


9806 


9087 


9088 


9089 


9090 


9091 


9092 


9093 


9094 


9095 


9606 


9097 


9098 


6606 


9100 


3422 


3423 


3424 


3425 


3426 


3427 


3428 


3429 


3430 


3431 


3432 


3433 


3434 


3435 


3436 


3437 


3438 


ti- 

oo 

CO 

u l 

ro 
On 
OO 

r- 

vO 

ro 


447003 3_c3_291 


2223507_fl_l 


!j 
i 

CM 
O 

VO 
vo 
O 
CM 


25682967_fl_13 


3072831 8 J1J2 


1432191 J1J3 


13158461 Jl_25 


24120381J1J6 


11069415_fl_27 


4379092_fl_28 


32117968_fl_29 


30494702 J1J2 


29816950_fl_33 


4180191_f2_41 


43053 18J2J4 


1 50239 16_f2_45 


CONTIG490 


CONT1G490 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG491 


CONTIG49I 


CONTIG491 


CONTIG491 


CONTIG49I 



301 



3 
CL. 

S -a 

•B 

e § 



o w 

,e 



a. T3 

c E 
a. « 



J g 



T3 

-a 



5J 03 



« .5 
is x 
o 



as 

o .o 

-5 Cd 

P. O 



I S 

it 

ox) cd 



S 

o _ 

O Cd 



o o <u 



5 £ S3 



E £ 



OD O LL) 

1 > = - 

Cd (L> 

" * T> E 

-J a. 



£3 °« 

■5 

E c/5 




V5 u r— i 

s ^ !• 

I'S .S I 

Ji 3 JJ « 



s -s s 



cd o_ 

§ 8 

s % 

CL> CO 



[ — ' c 



C3 



C CUD 

"3 .5 

V .£ 

j= ex 



cd .£ 



I 

j= a. 



= Cd , , 

5 t 5° 
e g 



cd £7 



3 J= 

£ E 



CM 



m 



o 

On 



On 



a 

§ 



2 * 



cd 

I £ 
w S 



C! 



G 



G 



CM 



as 
as 



a 



CM 

On 



O 



a 

o 
u 



302 



[pn:hypothetical 28.7 kd protein in 
marb-dcp intergenic region] 
[gn:yded] 


[pn:hypothetical protein] 


[pn:hypothetical protein] 


[pn:transcriptional regulator] 


[pn:hypothetical 21.7 kd protein in 
tktb-narq intergenic region] [gn:yffh] 


[pn:hypothetical protein] 


onmicrobacterium ammoniaphilum 
pn:unknown le:3382 re:>4972 
dixomplement 


[pn:hypothetical protein] [gn:ynej] 


[pn;multiple antibiotic resistance 
protein] [gn:mara] 


arac-like protein - azorhizobium 
caulinodans 


[pnrhypothetical protein] 


[pn.hypothetical protein] 


[pn:insertion element is 150 
hypothetical 33.3 kd protein] 
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[pn.hypothetical protein] 


[pn ;multiple antibiotic resistance 
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[pn:hypothetical protein in marb-dcp 
intergenic region] [gn:ydef] 


hypothetical protein 5 - rhizobium 
sp. (strain ic3342) 
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[pn:hypothetical protein in man* 
5 M, 'region] [gn:ydea] 


[pn;multiple antibiotic resistance 
protein marb] [gn:marb] 


[pn.hypothetical protein] 


[pn hypothetical 9.4 kd protein in 
sohb-topa intergenic region] 


[pn;anthranilate synthase component 
i] [gn:trpe] 


[pn:tryptophan synthase, beta chain] 
[gn:trpb] 


[pn: hypothetical protein in tonb-trpa 
intergenic region] [gn:ycic] 


[pn:pl4 protein] 


[pn : alcohol/acetaldehy de 
dehydrogenase] [gn:adhe] 


[pn:i aiamin adenosyltransferase] 
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[pn: hypothetical 8.6 kd protein in 
amya-flie intergenic region] 


[pn:sp] 


[pn:ferrienterobactin receptor 
precursor] [gn:fepa] 


[pn: hypothetical protein in kch-tonb 
intergenic region] [gn:ycii] 


orxscherichia coli pnxardiolipin 
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[pn:transcriptional regulatory protein 
uhpa] [gniuhpa] 
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[pn:hypothetical protein] [gn:yggs] 


[pn;hypothetical protein] [gn:yggt] 


[pn;hypothetical protein] [gn:yggw] 


[pn:nucleoside permease nupg] 
[gn:nupg] 


[pn:transcriptional regulator crir] 
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[pn:hypothetical 12.7 kd protein in 
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[pn hypothetical 21.1 kd protein in 
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[pn;hypothetical 5.4 kd protein in 
spea-metk intergenic region] 
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[pn :hy pothetical 49.4 kd protein in 
tsr-mdob intergenic region] 
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[pn:hypothetical 33.7 kd protein in 
avta-selb intergenic region] [gn:yiar] 
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[pn; hypothetical fimbrial-like protein 
in fimz 5"" region] [gn:ybcg] 


[pn:hypothetical 25.7 kd fimbrial 
chaperone in agai- mtr intergeni] 
[gn:yrai] 


[pn: hypothetical protein] [gn:yibf] 


[pn:selb] 


[pn: hypothetical transcriptional 
regulator in avta- selb intergenic 
region] [gn:yiaj] 


[pn hypothetical 17.5 kd protein in 
avta-selb intergenic region] 


[pn:xylulose kinase] [gn:xylb] 
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subunit] [gn:glyq] 
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subunit] [gn:glys] 
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[pn hypothetical protein] 


[pn;menaquinone biosynthesis 
protein meng] [gn:meng] 


[pn:transcriptional regulatory 
protein] [gnxpxr] 


orescherichia coli gn:soda le:<l 
re:225 di:direct snescherichia coli 
(strain k-12) (library: lambda from 
kohara et al. 


[pn:rhamnose permease] [gn:rhat] 


[pn: formate dehydrogenase-o, major 
subunit] [gn:fdog] 


[pn: formate dehydrogenase-o, major 
subunit] [gn:fdog] 


[pn: formate dehydrogenase-o, iron- 
sulfur subunit] [gn:fdoh] 


[pn: hypothetical 32 kd protein in 
glna-fdhe intergenic region] 


[pn: hypothetical 77.2 kd protein in 
glna-fdhe intergenic region] 


[pn: hypothetical 51.7 kd protein in 
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[pn hypothetical protein] 
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[pn:menaquinone biosynthesis 
protein mena] [gn:mena] 
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[pn:ferredoxin--nadp reductase] 
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[pn:l-rhamnose isomerase] [gn:rhaa] 


[pnirhamnulose- 1 -phosphate 
aldolase] [gn:rhad] 


[pn:ribose abc transporter] 


[pn:hypothetical 12.3 kd protein in 
rhad 3""region] [gniyiil] 


[pn;fdhe protein] [gn:fdhe] 


[pn:hypothetical protein in hemh-gsk 
intergenic region] [gn:ybac] 


[pn;hypothetical 31.2 kd protein in 
glna-fdhe intergenic region] 


[pn:hypothetical protein] 
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[pn.heat shock protein hslu] 
[gn:hslu] 


[pn:hypothetical 21.8 kd protein in 
tpia 3 ,M, region precursor] [gn:yiiq] 


[pn;triosephosphate isomerase] 
[gn:tpia] 
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[pn:high affinity ribose transport 
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[pn:high affinity ribose transport 
protein] [gn:rbsc] 


[pn: hypothetical 40.2 kd protein in 
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[pn:hypothetical 14.4 kd protein in 
cpxa-pfka intergenic region] 


[pn:hypothetical 32.9 kd protein in 
cpxa-pfka intergenic region] 


[pn: hypothetical 19.1 kd protein in 
pola-hemn intergenic region] 


[pn: hypothetical 31.9 kd protein in 
glna-fdhe intergenic region] 


[pn: hypothetical 15.9 kd protein in 
glna-fdhe intergenic region] 
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[pn:periplasmic sulphate binding 
protein] [gn:sbp] 


[pnxdp-diglyceride hydrolase] 
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[pn:65.4 kd gtp-binding protein in 
glna-fdhe intergenic region] 


[pn hypothetical transcriptional 
regulator in glna- fdhe intergenic 
region] [gn:yihw] 


[pn:hypothetical 32.8 kd protein in 
glna-fdhe intergenic region] 
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[pn hypothetical 8.6 kd protein in 
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[pn hypothetical protein] 
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[pn:hypothetical 16.5 kd protein in 
tpia-fpr intergenic region] [gn:yiir] 


[pn:hypothetical 9.6 kd protein in 
glpf-hslu intergenic region] [gn:yiiu] 


ceob„ceob,similar to cytoplasmic 
membrane proteins of the rnd 


[pn:hypothetical protein] 


[pn:hypotheticaI protein] 


[pn:uracil permease] [gn:uraa] 


[pn:hypothetical protein] 


[pn:phosphoribosylaminoimidazole- 
succinocarboxamide synthase] 
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[pn:hypothetical protein] 


[pn: hypothetical protein] [gn:ypfh] 


[pn:hypothetical 71.8 kd protein in 
tktb-narq intergenic region] [gn:yffg] 
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[pn: sulfate transport system 
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[pn:sulfate transport atp-binding 
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[pnxonserved protein] 


[pn:spermidine n 1 -acetyltransferase] 
[gn:speg] 


[pn hypothetical protein] 


[pn:anaerobic dimethyl sulfoxide 
reductase chain b] [gn:dmsb] 


[pn:hypothetical protein] 


hypothetical 40.7 kd protein in opde 
3'region (orf2). 


[pn:glycine betaine/carnitine/choline 
abc transporter] [gn;yvbd] 


[pn:osmoprotectant- binding protein] 
[gn:yvbc] 


[pn:hypothetical protein] [gn:ynfm] 


[pn: hypothetical protein] 


coenzyme pqq synthesis protein d. 


coenzyme pqq synthesis protein f (ec 
3.4.99.-). 


[pn:acyl carrier protein 
phosphodiesterase] [gn:acpd] 


[pn: putative serine transporter] 
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dcp-noha intergenic region] 


[pn:hypothetical protein] 
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[pn:hypotheticaI protein] 


[pn:hypothetical 10.7 kd protein in 
purt 5""region] [gn:yebg] 


[pnxholine abc transporter] 
[gn:prov] 


[pn:hypothetical protein] 


[pn hypothetical protein] 


[pn:yajd] 


hypothetical protein in pqqa 5'region 
(orf x) (fragment). 


coenzyme pqq synthesis protein c. 


coenzyme pqq synthesis protein e. 


[pn:ferric uptake regulation protein] 
[gn:fur] 


[pn:hypothetical protein] [gn:ybfn] 


[pn:hypothetical protein] 


mobilization protein mobl. 


[pn hypothetical protein] [gn:sfma] 


[pn:flagellar basal-body m-ring 
protein] [gn:flif] 


[pn:hypothetical protein] 
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rplm-hhoa intergenic region] 
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[pn:transcriptional regulator] 
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[pn: hypothetical protein] [gn:glob] 


[pn: hypothetical protein] [gn:yafs] 


[pn: hypothetical 37.8 kd protein in 
ung3' tM region] [gn:yfif] 


[pn:50s ribosomal subunit protein 
113] [gn:rplm] 


[pn;30s ribosomal subunit protein 
s9] [gn:rpsi] 


[pn.pts system, mannose-specific iic 
component] [gn:many] 


[pn:pts system, mannose-specific iid 
component] [gn:manz] 


[pn:hypothetical protein] 


[pn:phosphoheptose isomerase] 
[gn:gmha] 


[pn hypothetical 32.2 kd protein in 
vsr 5' Mt region] [gn:yeda] 


[pn:phosphohistidinoprotein-hexose 
phosphotransferase] [gn:ptsh] 


[pn : phosphoenolpy ruvate-protein 
phosphotransferase] [gn:ptsi] 


hypothetical protein k - salmonella 
typhimurium (fragment) 
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